More

piker · 2026-04-22T15:00:05 1776870005

I read two steps as a physical metaphor—i.e., it was following him closely—and not like a chess metaphor—I.e., two moves ahead.

piker · 2026-04-21T20:24:06 1776803046

> A handful of engineers were let go as the result of a re-alignment, and their AI counterparts are actively maintaining their code.

Feel like I'm reading a Gibson novel here.

lazide · 2026-04-21T23:18:18 1776813498

Hint: it’s also fiction

PradeetPatel · 2026-04-22T00:13:23 1776816803

I wish. Check out colleagues.ai as the Chinese equivalent of the programme.

lodovic · 2026-04-22T10:39:22 1776854362

That domain is for sale. This whole thread sounds like one of these "I sell ai agents as a saas and make 30k/month" stories.

lazide · 2026-04-22T03:03:18 1776826998

If that actually replaces your coworker, I feel sorry for everyone.

piker · 2026-04-20T08:41:52 1776674512

Is anyone finding value in these things other than VCs and thought leaders looking for clicks and “picks and shovels” folks? I just personally have zero interest in letting an AI into my comms and see no value there whatsoever. Probably negative.

TheDong · 2026-04-20T08:50:15 1776675015

I find some value as kinda a better alexa.

I have it hooked up to my smart home stuff, like my speaker and smart lights and TV, and I've given it various skills to talk to those things.

I can message it "Play my X playlist" or "Give me the gorillaz song I was listening to yesterday"

I can also message it "Download Titanic to my jellyfin server and queue it up", and it'll go straight to the pirate bay.

It having a browser and the ability to run cli tools, and also understand English well enough to know that "Give me some Beatles" means to use its audio skill, means it's a vastly better alexa

It only costs me like $180 a month in API credits (now that they banned using the max plan), so seems okay still.

swiftcoder · 2026-04-20T09:34:46 1776677686

> It only costs me like $180 a month in API credits (now that they banned using the max plan), so seems okay still.

I have a hard time imagining how much better Alexa would have to be for me to spend $180/month on it...

miroljub · 2026-04-20T10:21:54 1776680514

Just to clarify to people focusing on the $180/month price tag.

OpenClaw is not a CC-only product. You can configure it to use any API endpoint.

Paying $180/month to Anthropic is a personal choice, not a requirement to use OpenClaw.

ThunderSizzle · 2026-04-20T10:46:45 1776682005

So that leads to a question: Is there a physical box I could buy that an amortize over 5-7 years to be half the API cost?

In other words, assuming no price increase, 7 years of that pricing is $15k. Is there hardware I could buy for $7k or less that would be able to replace those API calls or alternativr subs entirely?

I've personally been trying to determine if I should buy a new GC on my aging desktop(s), since their graphic cards can't really handle LLMs)

ekidd · 2026-04-20T11:07:13 1776683233

You can't realistically replace a frontier coding model on any local hardware that costs less than a nice house, and even then it's not going to be quite as good.

But if you don't need frontier coding abilities, there are several nice models that you can run on a video card with 24GB to 32GB of VRAM. (So a 5090 or a used 3090.) Try Gemma4 and Qwen3.5 with 4-bit quantization from Unsloth, and look at models in the 20B to 35B range. You can try before you buy if you drop $20 on OpenRouter. I have a setup like this that I built for $2500 last year, before things got expensive, and it's a nice little "home lab."

If you want to go bigger than this, you're looking at an RTX 6000 card, or a Mac Studio with 128GB to 512GB of RAM. These are outside your budget. Or you could look at a Mac Minis, DGX Spark or Strix Halo. These let you bigger models much slower, mostly.

ThunderSizzle · 2026-04-20T15:46:56 1776700016

Thanks. That is what I suspected. The 3090's in my area seem pretty expensive for a several year old second hand card - they are the same price as a new 5080.

5090 is pretty expensive (~$4000) to justify it over a $10-50 sub. I guess the nice thing is the api side becomes "included", if I ever want to go that route. But if I have a GHCP $40 sub vs a $4000 GC to match it, just on hardware, pay off is at 8 years. If I add in electricity, pay off is probably never.

Sure, the sub can go up in price, but the value proposition for self-running doesn't seem to make sense - especially if I can't at least match Sonnet on GHCP or something like that.

I hope to self-run some not useless LLMs/Agents at some point, but I think this market needs to stabalize first. I just don't like waiting.

ekidd · 2026-04-22T07:50:01 1776844201

For what it's worth, eBay in the US currently has some used 3090s for about $1,300, including some marked "Buy it now." I got mine used for about $1,000, and I'm really happy with it—it's a very solid gaming card for Steam on Linux (if you don't need ray tracing), and it allows me to experiment with models up to about 35B parameters. I'm not saying it's a good investment for you in particular, of course! But it's solid at that price, and you can just chuck it in any consumer gaming rig and get a really fun AI "home lab".

As for models, I'm really genuinely impressed with Gemma4 26B A4B and Qwen3.6 35B A3B right now. Between them, I've seen solid image analysis, good medium-image OCR on very tough images, very good understanding of short stories, good structured data extraction from documents, extremely good language translation, etc. If you wanted to build a custom tool which summarized your inbox/RSS feeds/local news every day, or extracted information from emails and entered it into a database, or automatically captioned images, those tasks are all viable locally. The quality of the results is up dramatically in the last 12 months. At this point, my old personal non-agentic LLM benchmarks are "saturated": All the current leading models score extremely well on literally anything I was asking last year.

It's the true agentic coding workflows where the big models really stand out. And those models are all large enough that the hardware needs to amortized over enough users to run 24 hours/day.

happyopossum · 2026-04-20T17:42:04 1776706924

> or a Mac Studio with 128GB to 512GB of RAM. These are outside your budget.

M3 ultra with 80GOu cores and 256GB of ram is $7500 - that’s right at the edge of the budget, but it fits.. if you can get an edu discount through a kid or friend you’re even better off!

TheDong · 2026-04-20T11:21:30 1776684090

You can buy a roughly $40k gpu (the h100) which will cost $100/mo in electricity on top of that to get about 30-80% the performance of OpenAI or Anthropic frontier models, depending what you're doing.

Over 5 years, that works out to ~$45k vs ~$10k, and during that duration, it's possible better open models will come available making the GPU better, but it's far more likely that the VC-fueled companies advance quicker (since that's been the trend so far).

In other words, the local economics do not work out well at a personal scale at all unless you're _really_ maxing out the GPU at close to 50% literally 24/7, and you're okay accepting worse results.

As long as proprietary models advance as quickly as they are, I think it makes no sense to try and run em locally. You could buy an H100, and suddenly a new model that's too large to run on it could be the state of the art, and suddenly the resale value plummets and it's useless compared to using this new model via APIs or via buying a new $90k GPU with twice the memory or whatever.

vrganj · 2026-04-20T11:27:46 1776684466

This feels like it should be state infrastructure, the way roads, railroads and the postal system are.

dsr_ · 2026-04-20T12:03:43 1776686623

This feels like a market which hasn't settled into long-term profitability and is being subsidized by investors.

happyopossum · 2026-04-20T17:43:21 1776707001

And who is doing the research on this, training the models, and building new frontier models in your version of the world?

TheDong · 2026-04-20T11:38:28 1776685108

Note that the (edit: US) postal system is a for-profit system.

Given the trends of the capitalist US government, which constantly cedes more and more power to the private sector, especially google and apple, I assume we'll end up with a state-run model infrastructure as soon as we replace the government with Google, at which point Gemini simply becomes state infrastructure.

fineIllregister · 2026-04-20T12:24:09 1776687849

> Note that the (edit: US) postal system is a for-profit system.

That's not correct. If USPS makes more revenue than their expenses for a year, they can't pay it out as profits to anyone.

It's true that USPS is intended to be self-funded, covering it's costs through postage and services sold, and not tax revunue. That doesn't mean there's profit anywhere.

Supermancho · 2026-04-20T17:20:17 1776705617

> Note that the (edit: US) postal system is a for-profit system.

Pricing in the US postal system is not based on maximizing profit. Ths US postal system is not a for-profit system, at all. It is a delivery system (more or less) that happened to start turning a profit (2006) until PAEA. After that, the next time it made a profit was 2025.

happyopossum · 2026-04-20T17:44:38 1776707078

The USPS is self funding, not for-profit. The difference is both significant and consequential.

vrganj · 2026-04-20T11:44:51 1776685491

> Note that the postal system is a for-profit system.

That depends on the country in question :-)

zozbot234 · 2026-04-20T13:49:22 1776692962

For something like OpenClaw you realistically only need rather slow inference, so use SSD offload as described by adrian_b here: https://news.ycombinator.com/item?id=47832249 Though I'm not sure that the support in the main inference frameworks (and even in the GGUF format itself, at least arguably) is up to the task just yet.

wasfgwp · 2026-04-20T12:37:16 1776688636

You can use several times cheaper models than Claude as well, its not like you need anything big to handle all the uses cases listed above

swiftcoder · 2026-04-20T12:43:00 1776688980

Yeah, something like MiniMax m2.7 should be perfectly capable for this sort of thing, and is 10-20x cheaper

BirAdam · 2026-04-20T17:00:35 1776704435

You can get quite good models running on a Mac Studio, but these will not rival a frontier model.

$3,699.00

M4 Max 16c/40c, 128GB of RAM, 1TB SSD.

LM Studio is free and can act as a LLM server or as a chat interface, and it provides GUI management of your models and such. It's a nice easy and cheap setup.

rcxdude · 2026-04-20T11:07:35 1776683255

For something the size of Claude, probably not. But for smaller models, maybe (though they also are much cheaper to buy tokens for)

TheDong · 2026-04-20T11:33:25 1776684805

I mean, I'm getting $180/mo worth of fun out of playing with it and figuring out what it can do that it's worth it.

Like, no one bats an eye at all the people paying $100/mo for Hulu + Live TV, or paying $350/mo for virtual pixels in candy crush / pokemon go / whatever, and I'm having at least that much fun in playing with openclaw.

hunter-gatherer · 2026-04-20T11:57:37 1776686257

Everyone in my circle would seriously bat an eye at all those numbers. Congrats on making it to the upper class.

LeifCarrotson · 2026-04-20T14:25:12 1776695112

In my circle you'd get called out for taking on a $350 car payment much less a mobile game.

pydry · 2026-04-20T11:49:30 1776685770

I think quite a lot of people would bat an eyelid at those things.

If any of my friends admitted to spending $350/mo on candy crush i'd think that they'd badly need help for a gambling problem.

MisterTea · 2026-04-20T15:16:24 1776698184

I think paying $180/month because you don't want to walk 10 feet to a light switch or forgot the name of a 25 yo Gorillaz song you just heard is absurdly stupid.

whilenot-dev · 2026-04-20T12:28:19 1776688099

Just for reference: I pay 8€ for mobile, 40€ for internet and some occasional 5€ for VPNs each month. That's all the digital service subscriptions I'll need to have fun.

nickthegreek · 2026-04-20T14:20:39 1776694839

You could be doing for ALOT cheaper using something like minimax m2.7 for subagents. You dont need to be throwing all that cash out the door.

icedchai · 2026-04-20T14:40:52 1776696052

What are you using it for, seriously?

The things I want to use it for (like gathering weekly reports across a half dozen brokerage and bank accounts) are not things I'd trust it to do.

vovavili · 2026-04-20T10:29:21 1776680961

I do see how a very busy businessman or a venture capitalist would gladly pay 180$/month to offload chores and mundane work from his schedule. That comes down to 6$/month, which probably matches his monthly coffee budget.

ThunderSizzle · 2026-04-20T10:48:26 1776682106

Chores, yes. If there was a $180/month where ALL my families chores could be accomplished, I'd consider it.

That means picking up and cleaning the house after 3 kids and a dog. Grocery shopping. Dishes. Laundry. Chores.

Tech crap? Nope.

vovavili · 2026-04-20T11:18:08 1776683888

I would imagine that the list of digital chores of a very busy businessman are a bit more extensive. Even in your list, groceries is something that becomes digital once you're high enough in income.

StilesCrisis · 2026-04-20T11:28:21 1776684501

My grocery store has offered a pick-up or delivery option ever since COVID. Pick-up actually cost nothing extra. It's been years since we used it so I can't say definitively that it's still free, but the downside wasn't cost: it was the ability to pick the best item. If you let the store choose, you'll get the saddest looking produce every time, and the meat that's set to expire tomorrow.

vovavili · 2026-04-20T12:21:49 1776687709

To each his own.

StilesCrisis · 2026-04-20T14:13:48 1776694428

Does anyone pick the soggy vegetables and near-expired milk? This isn't really a preference--it's the store choosing what's in their best interest instead of your own.

mh- · 2026-04-20T15:50:10 1776700210

We have our groceries delivered every week (sometimes semi-weekly), and have done since 2018 or so. The people who pick the order work for the store, the people who deliver are gig workers.

The only "selection" complaint I regularly have had is the bananas are nearly always very unripe - like several days from being edible. But then I went to the store myself for several weeks and realized they just never have ripe bananas.

In other words, they're doing as well as I could do if I were shopping it myself.

ThunderSizzle · 2026-04-20T15:23:07 1776698587

Not really. Groceries have to be planned based on existing pantry state (current manual analysis), and future desired meals. Then produce a delta of what you have and what you want for those different meals.

Then you have a shopping list. You can do the shopping digitally now a days, but once it's delivered, now you have to organize it into the pantry existing stock, probably with a way to ensure older items are used first. This might involve separating out certain ingredients into smaller packaging and freezing some for later use.

That is all very manual, and I don't see how digitizing one part greatly simplifies it, especially if the digitization is error prone.

In a high enough income state, the answer is you hire a personal household chef or something like that. That isn't digitizing the problem- that is outsourcing it.

retired · 2026-04-20T09:19:08 1776676748

> It only costs me like $180 a month in API credits

In The Netherlands you can get a live-in au-pair from the Philippines for less than that. She will happily play your Beatles song, download the Titanic movie for you, find your Gorillaz song and even cook and take care of your children.

It's horrible that we have such human exploitation in 2026, but it does put into perspective how much those credits are if you can get a real-life person doing those tasks for less.

quietbritishjim · 2026-04-20T09:27:23 1776677243

I'm surprised to read that. Here in the UK, having a live-in au pair doesn't excuse you from paying the minimum wage for all the hours that they're working (approx $2300/month for a 35 hour week). You can deduct an amount to account for the fact that you're providing accomodation but it's strictly limited (approx $400/month).

swiftcoder · 2026-04-20T09:33:16 1776677596

The Netherlands has a weird and exploitative setup where you can classify your au pair as a "cultural exchange", and then pay them literal peanuts (room and board plus a token amount of "pocket money")

__alexs · 2026-04-20T10:14:13 1776680053

Another weird cultural quirk of the Dutch that will hopefully go the way of Zwarte Piet one day.

retired · 2026-04-20T09:35:51 1776677751

From what I can see online, the average compensation that an au-pair in The Netherlands receives is 300 euro per month, with living expenses being covered by the family. There is no minimum wage requirement for au-pairs like in the UK or the US.

aianus · 2026-04-20T10:23:20 1776680600

A semi-skilled English-speaking customer service agent in PH makes less than $700 a month to put this into perspective.

Working abroad is a totally reasonable proposition compared to working in the Philippines.

spockz · 2026-04-20T10:48:47 1776682127

The added cost of having an additional person to provide room and food for way exceeds that €300/month. Especially, when taking into consideration that you might have to extend/renovate the house to lodge another person. Adding an extra bedroom and possibly bathroom is not cheap.

jjcob · 2026-04-20T11:03:49 1776683029

Even if you assume the cost of lodging was 1000€ (which it isn't) then the au-pair would still be significantly underpaid.

A normal full time employee costs at least 2000€ a month (salary, tax, pension plan, health insurance, etc). If you are paying less than that you are definietly exploiting them.

throwthrowuknow · 2026-04-20T10:58:46 1776682726

So in reality you’re paying for their food, electricity and heat, letting them rent a room for free, and allowing them the use of the other facilities in your home and on top of that you’re giving them a spending allowance of 300 euro.

swiftcoder · 2026-04-20T12:40:16 1776688816

The marginal cost of food/electricity/bed for adding one additional person to a family is drastically less than those things would cost for a person living alone. Whichever way you slice this, the employer is making out like a bandit under this scheme.

balamatom · 2026-04-20T12:15:03 1776687303

In fact, you could do this for a homeless person today, in any city on the globe! And never even ask them to do anything for you!

kombine · 2026-04-20T09:28:24 1776677304

We shouldn't have to "import" people from poorer countries to do the mundane tasks we got too lazy to do ourselves.

grosswait · 2026-04-20T11:03:16 1776682996

The concept of having this kind of help is totally foreign to me, but with the exception of one, every family I’ve encountered that had an au pair have been two very busy high earning parents, neither of them lazy. I think you could argue that perhaps priorities have been misplaced, but not lazy.

DrewADesign · 2026-04-20T09:36:37 1776677797

Surely that’s subsidized?

A lot of people in the Silicon Valley area spend that much ($6/day) on coffee. What they don’t realize is how out of touch they are in thinking makes sense for the rest of the fucking world. $180/mo is about 5% of the median US per capita income. It’s not going to pick your kids up from school, do your taxes, fix your car, or do the dishes. It’s going to download movies and call restaurants and play music. It’s a hobby, high-touch leisure assistant that costs a lot of money.

duskdozer · 2026-04-20T10:28:31 1776680911

They aren't selling it to the median US earner. They're selling it (and trying to generate FOMO) to the out of touch people so that it becomes so entrenched that the median earner will be forced to use it in some capacity through their interaction with businesses, schools, the government, etc.

DrewADesign · 2026-04-20T14:07:21 1776694041

The customer they’re picturing in their mind’s eye is obvious. The out of touch part comes in when you look at the size of that market— not big— with how likely that market is to grow drastically— not very— and the amount they’re investing in building the product— all of everything plus a bazillion. With what they’ve invested, if they end up with an institutional market the likes of Microsoft split up among the winners, they fucked up.

The economics of these businesses are based way more on hope and hype than rational analysis and planning.

wasfgwp · 2026-04-20T12:44:03 1776689043

Realistically you certainly don’t Anthropic’s models for those things and can get something for a fraction of the price on OpenRouter/etc.

rjh29 · 2026-04-20T14:40:16 1776696016

Wow. I'd expect that from Singapore or UAE but finding it happen in a fairly developed Western country is a surprise.

vovavili · 2026-04-20T11:15:12 1776683712

Machines don't get tired, don't have to sleep, don't face principal-agent problems and can accumulate Skill.md instructions for decades without getting replaced. I definitely see the potential of something like OpenClaw for those who can afford it.

cameronh90 · 2026-04-20T10:49:50 1776682190

You're paying the au pair partly in accommodation, food, bills and a visa. The visa isn't coming out of your bank account, but it's definitely part of the incentive, so you could see it as a government subsidy.

For comparison, a full time "virtual assistant" with fluent English from the Philippines costs upwards of $700/month nowadays.

BigTTYGothGF · 2026-04-20T14:42:47 1776696167

> In The Netherlands you can get a live-in au-pair from the Philippines for less than that

What a horrible situation.

CalRobert · 2026-04-20T10:29:43 1776680983

How is that remotely possible without committing enormous violations of labor law?

throwatdem12311 · 2026-04-20T11:52:21 1776685941

Framed this way - then “replacing” this kind of human exploitation is definitely a good for humanity. If someone doing a job is practically a slave, then replacing them with an electron to token converter is a good thing.

The number one goal of AI should be to eliminate human exploitation. We want robots mining the minerals we use for our phones, not children. We should strive to free all of humanity from dangerous labour and the need for such jobs to exist.

If Elon Musk wants Optimus robots to help colonize Mars shouldn’t he be trying to create robots that can mine cobalt or similar minerals from dangerous mines and such?

esseph · 2026-04-20T13:07:29 1776690449

> The number one goal of AI should be to eliminate human exploitation.

I have some bad news.

_zoltan_ · 2026-04-20T09:56:43 1776679003

I doubt this is true in .nl. 180 a month is low for a live-in au-pair.

huflungdung · 2026-04-20T10:18:53 1776680333

> In The Netherlands you can get a live-in au-pair from the Philippines for less than that.

And you see nothing wrong with that?

tikotus · 2026-04-20T09:05:45 1776675945

I don't want to be judgemental, but I do find it funny that you're paying $180 for this convenience, and use it to pirate movies.

llmocallm · 2026-04-20T10:20:40 1776680440

Then allow me to be judgemental in your stead. I've done a similar setup as the above and completely locally. I dunno how they're paying so much, but that's ridiculously overpriced.

TheDong · 2026-04-20T11:46:40 1776685600

All the other models performed much worse for the skills I'm using. I tried gpt-5.1 (and then 5.4 again recently), and also tried pointing it at OpenRouter and using a few of the cheaper models, and all of them added too much friction for me.

Be judgemental all you want, but I feel like I'm paying for less friction, and also more security since my experiments also showed claude to be the least vulnerable to prompt injection attempts.

wasfgwp · 2026-04-20T12:46:16 1776689176

> models performed much worse for the skills I'm using

Hard to believe unless your are doing something much more complex than the things you listed

LeCompteSftware · 2026-04-20T09:33:54 1776677634

Let's also point out the $180 is going to a hideously evil AI company which pirated millions of books and movies.

simonkagedal · 2026-04-22T05:11:56 1776834716

In a possible defense of grandparent, whenever I pirate movies these days (seldomly), it would be not because I don’t want to pay, but either because I want the offline reliability or because I just can’t find it elsewhere.

(The latter would however not be the case for Titanic, I imagine.)

TeMPOraL · 2026-04-20T09:27:57 1776677277

It's not the only thing they're doing with it. I mean, the logic is sound - $180 goes into automating bunch of manual processes in personal life, one of which is getting movies, which in some cases involves going out on the high seas.

puelocesar · 2026-04-20T08:55:49 1776675349

180 grand a month for PA is a lot of money. But I guess each person has its own priority. I mean, I can pay a very fancy gym with that price instead of the shitty popular one I go, which would probably improve my well being much more than asking to play Gorillaz

quietbritishjim · 2026-04-20T09:19:23 1776676763

"a grand" means a thousand (dollars or pounds or whatever). $180k / month really would be a lot of money. I'd be your PA for that!

puelocesar · 2026-04-20T13:59:41 1776693581

ok, now I wish there was an edit button. Thanks for noticing the mistake though

bluedel · 2026-04-20T09:23:17 1776676997

Am I right to be a little concerned by the phrase "it'll go straight to the pirate bay"?

Not to be a narc or anything, but is OpenClaw liable to just perform illegal acts on your behalf just because it seemed like that's what you meant for it to do?

esseph · 2026-04-20T13:10:03 1776690603

> Not to be a narc or anything, but is OpenClaw liable to just perform illegal acts on your behalf just because it seemed like that's what you meant for it to do?

There's at least a couple of dozen instances right now, somewhere, getting very close to designing boutique chemical weapons.

jappgar · 2026-04-20T11:08:13 1776683293

Seems like the only people using pirate bay in 2026 are "privacy obsessed" rich middle-aged guys.

I think they do it mostly to feel young and edgy.

Supermancho · 2026-04-20T17:23:55 1776705835

I use it to get media for my family to watch on any tv using Plex. Sometimes I get books/manuals etc.

Hendrikto · 2026-04-20T10:13:02 1776679982

180$/month to queue playlists does not “seem okay” at all. We must be living in different worlds.

xorcist · 2026-04-20T13:53:21 1776693201

> I can message it "Play my X playlist"

People do this? Or is it some sort of joke way above my head?

In what bizarre world is it easier to ask a massive LLM to play a playlist rather than ... literally hitting the play key on it?

chasd00 · 2026-04-20T19:43:03 1776714183

One where there's 50 playlists and your hands are wet because you're right in the middle of doing dishes. Besides, your phone is in the other room anyway.

kirubakaran · 2026-04-20T20:34:13 1776717253

I'd use a towel for such purposes :)

jappgar · 2026-04-20T11:05:15 1776683115

You're spendin 180 a month on tokens and still refusing to buy media like Titanic?

TheDong · 2026-04-20T11:43:57 1776685437

If you've figured out how to pirate Anthropic's models and enough GPUs to run it for less than my API costs, I'm all ears

janderson215 · 2026-04-20T13:55:05 1776693305

While I love the idea of using it for home/personal automation (and it sounds like you've done a good job executing it), this comment makes it seem like avoiding paying for The Titanic is almost as important as having an OpenClaw-driven assistant/automation system.

jappgar · 2026-04-21T15:28:42 1776785322

I haven't but I have figured out how to pay creators for media I enjoy. I spend less than 180 a month too.

qsera · 2026-04-20T10:24:16 1776680656

I have the almost same thing using a network connected raspberry-pi and no AI.

philipallstar · 2026-04-20T12:25:42 1776687942

> "Download Titanic to my jellyfin server and queue it up", and it'll go straight to the pirate bay

You could build up a legitimate collection for much less than $180/mo.

coldtea · 2026-04-20T11:45:22 1776685522

Regarding Alexa, none of those use cases sound that useful to have an ever-present listening device at home, except if one is bedbound or something.

tempaccount5050 · 2026-04-20T12:01:14 1776686474

Using OpenClaw for that is nuts. Claude or GPT could just one shot an app for you that does all that and uses 0 tokens once you've built it.

vbezhenar · 2026-04-20T09:45:29 1776678329

Many wealthy people use human assistants to offload mundane work.

This is cheap replacement for ordinary people.

It's going to be big. But probably it's best to wait for Google and Apple to step up their assistants.

piker · 2026-04-20T09:54:12 1776678852

Yes, and that's because the workflow of those people generally requires managing a crazy, dynamic schedule including travel, meetings, comms, etc. Those folks need real humans with long-term memories and incentives to establish trust for managing these high-stakes engagements. Their human assistants might find these things useful, but there's zero chance Bill Gates is having an AI schedule his travel plans or draft his text messages.

OTOH, this isn't an issue for "ordinary people". They go to work, school, children's sports events, etc. If they had an assistant for free, most of them would probably find it difficult to generate enough volume to establish the muscle memory of using them. In my own professional life, this occurred with junior lawyers and legal assistants--the juniors just never found them useful because they didn't need them even though they were available. Even the partners ended up consolidating around sharing a few of them for the same reason.

Down in this thread someone mentions it being an advanced Alexa, which seems apt. Yes, a party novelty but not useful enough to be top of mind in the every day work flow.

Terr_ · 2026-04-20T11:12:01 1776683521

Side rant: A disproportionate amount of AI assistant marketing involves scenarios that look middle class, but actually require customers wealthy enough risk money on errors. Like buying the wrong thing, or even buying the right thing at the wrong price.

nainachirps_ · 2026-04-20T10:13:16 1776679996

I am ordinary people. I have adhd. I have been dying for assistance in scheduling and planning. Am not employed enough to afford hiring a human yet. Am hopeful these will reach maturity for me to he able to host one on my own device. Or find a private provider with good security model and careful data handling.

user_7832 · 2026-04-20T10:37:39 1776681459

Not +1, but +100 to your comment (fellow ADHD'er here). Even a virtual friend who'd help me stay on track would be excellent, and if I had a physical human assistant... that would legitimately make many aspects of my life much better. (Simple example: I could ask them to nag me to exercise.)

vbezhenar · 2026-04-20T10:10:24 1776679824

Going to the shop and buying groceries is not hard work. But I don't do that since delivery became available. I'm lazy and delivery is free. Same for ordinary people needs. It's not a big deal to manage my life, but if I can avoid doing that for free, that's probably what I'll do. For $200? Not sure. For $20? Absolutely. So the question is already about price.

spockz · 2026-04-20T10:40:03 1776681603

Off-Topic: Are you sure delivery is free? When comparing prices online vs my local supermarket of the same brand, online prices trend higher. Locally the store also has more products on sale than available online. Only recently online shopping has become slightly cheaper because they now have “bulk” deals for 5-20% discount.

andai · 2026-04-20T10:40:55 1776681655

I'm not sure how solvable it is. It only takes one screw up to ruin the reputation, and a screw up is basically guaranteed.

The tech has existed for a while but nobody sane wants to be the one who takes responsibility for shipping a version of this thing that's supposed to be actually solid.

Issues I saw with OpenClaw:

- reliability (mostly due to context mgmt), esp. memory, consistency. Probably solvable eventually

- costs, partly solvable with context mgmt, but the way people were using it was "run in the background and do work for me constantly" so it's basically maxing out your Claude sub (or paying hundreds a day), the economics don't work

- you basically had to use Claude to get decent results, hence the costs (this is better now and will improve with time)

- the "my AI agent runs in a sandboxed docker container but I gave it my Gmail password" situation... (The solution is don't do that, lol)

See also simonw's "lethal trifecta":

>private data, untrusted content, and external communication

https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

The trifecta (prompt injection) is sorta-kinda solved by the latest models from what I understood. (But maybe Pliny the liberator has a different opinion!)

feigewalnuss · 2026-04-20T14:38:36 1776695916

Disclosure: I wrote the linked post.

The "gave it my Gmail password" problem has a better answer than "don't do that." Security kicks itself out of the room when it only says no. Reserve the no for the worst days. The rest of the time, ship a better way.

That's why I built the platform to make credential leaks hard. It takes more than a single prompt. The credential vault is encrypted. Typed secret wrappers prevent accidental logging and serialization. Per-channel process isolation means a compromise in one adapter does not hand an attacker live sessions in the others.

"Don't do that" fails even for users trying their hardest. Good engineering makes mistakes hard and the right answer easy. Architecture carries the weight so the user does not have to.

On the trifecta being "sorta-kinda solved" by newer models, no. Model mitigations are a layer, not a substitute. Prompt injection has the shape of a confused-deputy problem and the answer to confused deputies has always been capabilities and isolation, not asking the already confused deputy to try harder.

You want the injection to fail EVEN when the model does not catch it.

andai · 2026-04-20T15:23:59 1776698639

Thanks. Yeah, I skipped that part in my comment, there are solutions for a lot of this stuff.

The one I see the most is brokers. Agent talks to a thing, thing has credential and does the task for the agent. Or proxies that magically inject tokens.

I think this only works for credentials though?

It doesn't solve the personal information part (e.g. your actual emails), right?

As for security, my solution was: keep it simple and limit blast radius.

Expect it to blow things up, and set things up so it doesn't matter when it happens.

I don't like docker so I just made a Linux user called agent. Agent can blow up all the files in its own homedir, and cannot read mine.

I felt really clever until I realized there's an even better solution: just give it a laptop (or Mac mini, or server, or whatever we're doing this week).

Same result but less pain in my ass. Switching users is annoying (and sharing files, and permission issues...). Also, worrying about which user I'm running stuff as... The thing just shouldn't be on my machine in the first place. It should have its own!

Functionally, its own Linux user or root on a $3 VPS are the same thing. It blows up the VPS, I just reset it.

For keys, I don't do anything fancy. It can leak all my keys. But if anyone steals them, they can exhaust my entire $5 prepaid balance ;) Blast radius limited.

But yeah, needs, tastes and preferences may differ.

feigewalnuss · 2026-04-21T04:28:30 1776745710

Right, we have to see credentials and personal data as different problems. Wirken addresses the first directly and only partially the second. Session scoping keeps injection damage inside one channel's scope so a poisoned email cannot reach into your Telegram credentials. The model still reads the email content during that session, and any prompt injection in that content can still act within what just that session can reach.

The layer that addresses content-level flow is information-flow enforcement above identity. TriOnyx (https://github.com/tri-onyx/tri-onyx) looks at that exact problem: taint and sensitivity tracking, gateway kills on threshold breach.

It complements Wirken. You need identity before you can meaningfully ask what agent A has been exposed to.

On the agent-gets-its-own-machine approach, that is fine as a blast-radius strategy and I have no quarrel with it. It trades isolation between channels for isolation between the agent and the host. If you only have one channel and disposable keys, it works. It stops working as soon as the agent holds something you cannot cheaply rotate, which for most people ends up being their messaging identities.

andai · 2026-04-21T23:28:51 1776814131

You mean like giving Claude your HN password? ;)

eloisant · 2026-04-20T10:33:48 1776681228

$180 a month is huge for "ordinary people".

So I guess that leaves the in-between people who don't care about spending $180 every month but don't have any personal staff yet or even access to concierge services.

LeCompteSftware · 2026-04-20T10:56:19 1776682579

The problem is that if you're wealthy enough to hire someone to do your errands, those errands likely aren't very mundane - the exception is a socialite giving their friend a low-effort job, but executive assistants are paid well because their jobs are cognitively demanding.

OTOH a lower-middle-class Joe like me really does have a lot of mundane social/professional errands, which existing software has handled just fine for decades. I suppose on the margins AI might free up 5 minutes here or there around calendar invites / etc, but at the cost of rolling snake eyes and wasting 30 minutes cleaning up mistakes. Even if it never made mistakes, I just don't see the "personal assistant" use case really taking off. And it's not how people use LLMs recreationally.

Really not trying to say that LLM personal assistants are "useless" for most people. But I don't think they'll be "big," for the same reason that Siri and Alexa were overhyped. It's not from lack of capability; the vision is more ho-hum than tech folks seem to realize.

Sam713 · 2026-04-20T16:32:47 1776702767

Siri is quite bad though. Personally, I would get a lot of value out of a more accurate Siri that could function as a device/personal assistant. Right now, if I prompt Siri to “search calendar app for flights scheduled this month”, it just straight up fails. That should be a relatively simple contextual search; just asking it to pull existing data. Siri/Apple Intelligence is overhyped because it can’t even perform basic functions effectively, or takes more time than just doing the same function manually.

TeMPOraL · 2026-04-20T14:47:18 1776696438

> which existing software has handled just fine for decades

Existing software is what dumped most of those errands on you in the first place.

torginus · 2026-04-20T12:15:27 1776687327

My 2 cents is that so far LLMs have had a bad track record in replacing people in jobs where simple software logic and flowcharts wouldn't do the job.

wesleywt · 2026-04-20T15:17:49 1776698269

Its not going to be big. There is no obvious use for them unlike email or the Iphone.

lionkor · 2026-04-20T10:59:15 1776682755

Those human assistants can be held accountable.

sekh60 · 2026-04-20T12:11:00 1776687060

I deleted your calendar, I'm sorry.

lionkor · 2026-04-21T20:33:13 1776803593

But you BET that person is never, ever making that mistake again!

ZeroGravitas · 2026-04-20T09:36:36 1776677796

I see the appeal, but I also see the risks.

If you ignore the risks I don't see why it's hard to see value.

The AI can read all your email, that's useful. It can delete them to free up space after deciding they are useless. It can push to GitHub. The more of your private info and passwords you give it the more useful it becomes.

That's all great, until it isn't.

Putting firewalls in place is probably possible and obviously desirable but is a bit of a hassle and will probably reduce the usefulness to some degree, so people won't. We'll all collectively touch the stove and find out that it is hot.

theshrike79 · 2026-04-20T12:26:49 1776688009

Just limit the tooling. There's no reason for the AI to be able to delete emails for example.

I built a fastmail CLI tool for my *claw and it can only read mails, that's it. I might give it the ability to archive and label later on, with a separate log of actions so I can undo any operation it did easily.

It's pretty decent at going "hey, there's a sale on $thing at $store", for mails, but that's about it.

greedo · 2026-04-20T17:56:20 1776707780

Deleting email to free up space has to be the most ridiculous justification for burning up the environment.

bitmasher9 · 2026-04-20T12:47:00 1776689220

Yep, I’m seeing real value. I use them for tasks that an assistant might have done in the past. It’s much cheaper than hiring a human, and setup is much faster than finding a good assistant. I’m honestly considering giving it access to accounts with payment information so it can book flights and hotels for me.

You can ask it questions like “what classes does my gym offer between 6-8pm today” and just get a good answer instead of wasting time finding their schedule. You can tell it to check your favorite band’s website everyday to see if they announce any shows in your city. You can tell it to read your emails and automatically add important information to your calendar.

This isn’t the space where I get the most value from AI, but it’s nice to have a hyper connected agent that can quickly take care of more smaller and more personal tasks.

piker · 2026-04-20T13:04:23 1776690263

No offense but all of those are near zero value except entertainment to the orchestrator. That’s without understanding the failure rate and modes. It’s telling that you haven’t yet given it your credit card.

bitmasher9 · 2026-04-20T22:58:19 1776725899

I would agree that the value is low. It’s the type of thing I wouldn’t pay $20/hr to have a human do, if you want to put a value cap on this type of activity.

seniorThrowaway · 2026-04-20T20:23:14 1776716594

Yes, they are good at being coding "interns". They work together in slack which allows management visibility and tasking directly to them. i.e. people who aren't going to be managing a bunch of claude code's all day. I say interns because they are incredibly smart at some things like implementing well defined coding plans and incredibly dumb in other ways, like shipping non-compiling code and asking a human to troubleshoot. To do this requires thoughtful setup, you have to onboard them more like you would a human than a piece of software. Give them their own "workstation", accounts etc. Limit what they can do in those accounts, not in their .md files or skills or anything, they will never follow those 100%, just like a person won't follow directions 100% of the time.

stingraycharles · 2026-04-20T11:19:25 1776683965

This is being asked on pretty much every Openclaw thread, and the use cases brought up seem roughly similar: digital assistant.

It of course depends heavily on your work, but my work is 50% communication / overseeing, and I simply lose track of everything.

I don’t give it any credentials of any sort, but I run data pipelines on an hourly basis that ingest into the agent’s workspace.

littlecranky67 · 2026-04-20T10:46:01 1776681961

I can see a value in a smarter email-inbox sorting algorithm - but only because all major players (except google which I don't trust with my mails) have abandoned bayesian email filtering with training. This was standard in 2005 in such basic clients such as the Opera browser, but somehow we lost this technology along the way.

easygenes · 2026-04-20T11:09:08 1776683348

I was an original Thunderbird pre-1.0 (from 2003) user and prior to that, Netscape Mail, and am quite certain it has had bayesian spam filtering all this time, at least since the late ‘90s. That was a headline feature in the early days. My first email account used POP3 through a shared web host for my own domain in that era.

Edit: Yes it’s still there https://support.mozilla.org/en-US/kb/thunderbird-and-junk-sp...

Terr_ · 2026-04-20T11:06:13 1776683173

I can't recall the name, but I vaguely remember a Bayesian spam filter for arbitrary POP3 accounts in the 2000s that had a local web frontend, and how excited I was at its effectiveness.

I believe that the shift from "my one computer" to multiple clients (computer + phone + webmail) probably has something to do with it. Even with IMAP sharing state, you still don't have a great way to see and control the filtering, except by moving things in/out of spam folders.

pizza234 · 2026-04-20T11:11:28 1776683488

> Is anyone finding value in these things other than VCs and thought leaders looking for clicks and “picks and shovels” folks?

Mostly (but of course, not exclusively), porn for the techies. Receiving a phone notification every time a PR is opened on a project of yours? Exciting or sad, depends on one's outlook on life.

moffkalast · 2026-04-20T11:19:34 1776683974

I thought emails from github already did that?

mgkimsal · 2026-04-20T11:26:45 1776684405

I think the more useful part is the parts that checks a ticket, fixes a bug, then opens the PR automatically. Whether you get an email or a phone text or call from a voice agent is ... somewhat secondary, im.

moffkalast · 2026-04-20T14:21:29 1776694889

Fixes a bug, yeah that's some wishful thinking right there. I don't think it counts if it adds three new ones in the process.

eucyclos · 2026-04-20T23:32:21 1776727941

If it added four last month I'd say it absolutely counts. Trends often matter more than position.

_pdp_ · 2026-04-20T08:44:13 1776674653

There is value but it is hard to discover and extract outside of a few known areas - like coding, etc.

piker · 2026-04-20T08:47:12 1776674832

Yes, I can see the (potential) value in working with agents in software development. The “claw” movement I understood to suggest value in less constrained access to my inbox, personal messages, calendar etc like some sort of PA. It’s hard to quantify how much damage a bad PA can do to someone’s personal and professional life, so if my understand is correct, this seems like a dead end.

_pdp_ · 2026-04-20T09:41:59 1776678119

I posted this comment in another thread so reposting it here because it seems to be on topic.

---

IMHO, the biggest problem with OpenClaw and other AI agents is that the use-cases are still being discovered. We have deployed several hundred of these to customers and I think this challenge comes from the fact that AI agents are largely perceived as workflow automation tools so when it comes to business process they are seen as a replacement for more established frameworks.

They can automate but they are not reliable. I think of them as work and process augmentation tools but this is not how most customers think in my experience.

However, here are a several legit use-case that we use internally which I can freely discuss.

There is an experimental single-server dev infrastructure we are working on that is slightly flaky. We deployed a lightweight agent in go (single 6MB binary) that connects to our customer-facing API (we have our own agentic platform) where the real agent is sitting and can be reconfigured. The agent monitors the server for various health issues. These could be anything from stalled VMs, unexpected errors etc. It is firecracker VMs that we use in very particular way and we don't know yet the scope of the system. When such situations are detected the agent automatically corrects the problems. It keeps of log what it did in a reusable space (resource type that we have) under a folder called learnings. We use these files to correct the core issues when we have the type to work on the code.

We have an AI agent called Studio Bot. It exists in Slack. It wakes up multiple times during the day. It analyses our current marketing efforts and if it finds something useful, it creates the graphics and posts to be sent out to several of our social media channels. A member of staff reviews these suggestions. Most of the time they need to follow up with subsequent request to change things and finally push the changes to buffer. I also use the agent to generate branded cover images for linkedin, x and reddit articles in various aspect ratios. It is a very useful tool that produces graphics with our brand colours and aesthetics but it is not perfect.

We have a customer support agent that monitors how well we handle support request in zendesk. It does not automatically engage with customers. What it does is to supervise the backlog of support tickets and chase the team when we fall behind, which happens.

We have quite a few more scattered in various places. Some of them are even public.

In my mind, the trick is to think of AI agents as augmentation tools. In other words, instead of asking how can I take myself out of the equation, the better question is how can I improve the situation. Sometimes just providing more contextually relevant information is more than enough. Sometimes, you need a simple helper that own a certain part of the business.

I hope this helps.

esseph · 2026-04-20T13:21:29 1776691289

> They can automate but they are not reliable.

This is why I won't use them for anything externally facing or with high or even moderate damage potential.

Which basically means they don't get used at all.

_pdp_ · 2026-04-20T13:47:27 1776692847

This is my point ... it is a change of perspective. They are best suited for cooperation for now.

bsenftner · 2026-04-20T12:05:10 1776686710

Great information post. don't let the AI fad boi's down votes lead you to think this is not a very worthwhile contribution.

eucyclos · 2026-04-20T23:30:24 1776727824

Weird that you're being downvoted for sharing anecdata on actual use cases. It's as if there's a desire to downplay the positive aspects of this technology in the HN community.

jstummbillig · 2026-04-20T10:36:41 1776681401

> letting an AI into my comms

Idk, it's strange for me to think of it that way. It's tech. If it does something useful, that's cool.

Data protection is always a consideration. I just don't consider a LLM to be a special case or a person, the same way that I don't have strong feelings about "AI" being applied in google search since forever. I don't have special feelings or get embarrassed by the thought of a LLM touching my mails.

Right now for me, agentic coding is great. I have a hard time seeing a future where the benefits that we experience there will not be more broadly shared. Explorations in that direction is how we get there.

rowanG077 · 2026-04-20T10:43:17 1776681797

The problem for me is not the LLM reading it. The problem is the company behind it can most likely recover the sessions. That is a problem since they could share it with whomever they want. Even if they are fully incorruptable it's also not uncommon that they simply get hacked and all this data ends up on the open market.

piker · 2026-04-20T10:48:32 1776682112

My issues aren’t really with privacy so much as what the failure modes look like, and, more fundamentally, with becoming a passenger to my own life.

onchainintel · 2026-04-20T08:57:41 1776675461

It all depends on what you do aka your use case. If you're in the content creatio business, which is part of my responsibilities, then yes has been massively helpful. For other roles, I can absolutely see no use case or benefit. Context matters, like with everything.

pjmlp · 2026-04-20T10:29:30 1776680970

Same here, I care to the extent I am obligated to, and staying relevant for finding a job.

andai · 2026-04-20T10:35:46 1776681346

It's pretty much just Claude Code, except hooked up to your Telegram / WhatsApp / iMessage.

I don't know why they don't make an official integration for it. Probably cause they're already out of GPUs lol

edschofield · 2026-04-20T11:55:55 1776686155

Claude Code does now have integration with Telegram, Discord, and iMessage as of a few weeks ago:

https://code.claude.com/docs/en/channels

I haven't used OpenClaw since then ...

coldtea · 2026-04-20T11:41:43 1776685303

Newb technies love it.

mathgladiator · 2026-04-20T09:24:02 1776677042

Agent environments like OpenClaw are in the toy phase, and OpenClaw is teaching people how to build things with agents in a toy-like and unreliable way. I used my understanding of OpenClaw to build scalable + secure + auditable agent infrastructure in my platform such that I can build products that other people can use.

bayindirh · 2026-04-20T09:26:54 1776677214

We had better agent infrastructures (namely JADE) back in the day. I worked with them, and now these things look like flimsy 50¢ plastic toys to me, too.

rimliu · 2026-04-20T10:32:33 1776681153

I am also surprised by the number of people willing to outsource their lives.

eucyclos · 2026-04-20T23:35:06 1776728106

Just the boring bits. I don't need openclaw to lounge by a pool thinking about macro trends for me.

jillesvangurp · 2026-04-20T13:53:02 1776693182

I talk to a lot of business people that are interested in automating very basic things in their inbox, on their Google drive, in CRMs, etc. The reason is not that they want to be cool and hip but because they are forced to spend lots of their precious time doing very dull and repetitive things. Promising to take some of that pain away is a really easy sell. Hence all the hype around OpenClaw.

If you look around in the business world, there is an absurdly large number of people still doing all sorts of things manually that they probably shouldn't. And its costing them money. Even before AI that was true. But now it's increasingly becoming obvious to these people that there are solutions out there that might work. There's a fair amount of FOMO on that front with more clued in people that have heard of other people allegedly being a bit smarter than them.

From a practical experience point of view, most people probably don't have the hands-on experience to make a good judgment just yet. "I tried Chat GPT once and it hallucinated" doesn't really count as valid experience at this point and many non-technical people are still at that level. There generally are a lot of headless chickens making absurd claims (either way) about what these systems can and cannot do making sweeping statements about how possible or impossible things are.

If you take the time and sit down to automate a few things you'll find that: 1) the tools aren't great right now 2) there are lots of basic plumbing issues that get in the way 3) fixing those plumbing issues is not rocket science and something anyone with basic CLI or scripting skills can solve easily 4) you can actually outsource most of that stuff to coding agents. 5) if you figure some of the basics out, you can actually make OpenClaw or similar systems do things that are valuable. 6) Most people that aren't programmers won't get very far given the current state of tools. 7) this might change rapidly as better tools become available. 8) people generally lack the imagination to see how even basic solutions could work for them with these systems.

I have an OpenClaw up and running for our company. It is doing some basic things that are useful for us. After solving some basic plumbing issues, it's now a lot easier to make it do new things. It's not quite doing everything just yet (lots more plumbing issues to solve) and we have our healthy hesitations about letting it loose on our inboxes. But it's not useless or without value. Every plumbing issue we solve unlocks a few more use cases. There's a bit of a gold rush right now of course. And "picks and shovels" people like myself are probably going to do a brisk business.

You can wait it out or tap into the action now. That's your choice. But try making it an informed choice. And no better experience than the first-hand type.

cl0ckt0wer · 2026-04-20T11:40:00 1776685200

Mostly it's fun. It'll so some light infra management for me too.

mark_l_watson · 2026-04-20T12:24:34 1776687874

I ran OpenClaw in a container, on a VPS without connection to messaging systems, so perhaps that is why I didn't get value.

Similarly, I have been using Hermes Agent also inside a container, and on a VPS with only access to a local directory in the VPS with a dozen active projects on GitHub. I don't give it access to my GitHub credentials, but allow it to work in whatever branch is checked out.

This setup is fabulously productive. I use it about every other day to perform some meaningful task for me. It is inexpensive also. A task might take 20 minutes and cost $0.25 in GLP-5.1 API costs.

So TLDR: out of the box, I use Hermes at least one hour a week and find it to be a wonderful tool.

surgical_fire · 2026-04-20T11:17:53 1776683873

No.

But I am someone that, for example, dislikes home automation. Know that thing that you ask Alexa to open your curtains? I think that is cringe af.

Maybe there's an overlap with the crowd that likes that.

dankobgd · 2026-04-20T10:01:12 1776679272

no, it's only for scammers

iugtmkbdfil834 · 2026-04-20T09:15:10 1776676510

Eh, buddy says he uses them for his network and, apparently, some light IT maintenance for his family members. So far it seems to be working for him. I am not that brave.

piker · 2026-04-19T11:22:46 1776597766

As a kid who was interested in stuff like this in the 90s, the ads were part of the enjoyment for me. You could look at components, have rounds-to-zero idea what they did but let your imagination soar at the possibility of stringing them together into something new.

piker · 2026-04-17T19:32:03 1776454323

That was a great promise before the models starting becoming "moody" due to their proprietors arbitrarily modifying their performance capabilities and defaults without transparency or recourse.

mh- · 2026-04-17T23:09:42 1776467382

I still haven't seen any statistically sound data supporting that this is happening on the API (per-token pricing.)

If you've got something to share I'd love to see it.

jessermejia · 2026-04-18T08:15:40 1776500140

There's an interesting analysis here: https://github.com/anthropics/claude-code/issues/42796

>The most striking row is user prompts: 5,608 in February vs 5,701 in March. The human put in the same effort. But the model consumed 80x more API requests and 64x more output tokens to produce demonstrably worse results.

mh- · 2026-04-18T18:21:09 1776536469

Sorry, "this" referred to the parent comment's claim.

> models starting becoming "moody" due to their proprietors arbitrarily modifying their performance capabilities

The tokenizer changes are measurable, the above is quite difficult to quantify.

There are a few sites floating around that purport to, but all of them have fatal flaws in their methodology.

fragmede · 2026-04-19T15:34:08 1776612848

Unfortunately, LLM performance isn't an exact science and some observations are going to be subjective. Observations like ChatGPT being "lazy" in the Winter. Wanting to form opinions based on hard data, aka science, and not vibes is entirely reasonable but doesn't make the vibes a figment of imagination. Or as Jeff Bezos put it, "When the data and the anecdotes disagree, the anecdotes are usually right." And while he's not a scientist, his success does put some weight behind that quote. (as does digging deeper in what he meant by that.)

piker · 2026-04-17T16:35:39 1776443739

Yes! And music! What a social thing listening to a CD or watching MTV in someone's room used to be. Now it's just isolating.

piker · 2026-04-15T08:08:10 1776240490

This is great advice (that we need to follow) but needs to be updated for 2026. The information value of providing (or receiving) a demo has dropped to roughly zero with vibe coding. Today, an apparently functional and useful product can be produced and demoed in minutes, but that demo provides absolutely zero information into the technical capabilities of the demoing team to follow through on promises with polish and at scale. It doesn't reflect a studied architecture or edge case handling. It basically only shows a vision, which can be tailored to perfectly mirror the recipient's expressed desire even though it's absolute vaporware. This makes it even harder to sell to enterprise in 2026 when the scene is awash in such noise.

operatingthetan · 2026-04-15T08:11:56 1776240716

>that demo provides absolutely zero information into the technical capabilities of the demoing team to follow through on promises with polish and at scale.

With vibe coding comes vibes-based capital. I'm only half kidding.

QuantumGood · 2026-04-15T17:25:56 1776273956

Speed to market has always been a factor. Venture prioritizing this factor due to AI accelarating speed is probably expanding ... for now. In bubbles, speed tends to rise to a higher priority.

Yes, first/fast is sometimes a negative factor (e.g. first to market doesn't mean best, second to market can take advantage of proof of market proviced by first, etc.)

keiferski · 2026-04-15T08:40:29 1776242429

In my experience demos are half about the product and half about the team / company behind it. So I wouldn’t call its value zero: part of the reason a potential client is asking for a demo is to see if there’s actually a real, intelligent company behind the product.

bonesss · 2026-04-15T10:57:44 1776250664

The sales pitch needs to compete with other pitches, so I gotta imagine in a vibe-heavy market a solid sales team is gonna lead with all the stuff you can’t vibe.

Customer transaction numbers, service response times, human staffing for VIP customer service, and human engineers who are recognized domain experts. The cliche live call to customer support with some hairy-ass customer specific problem.

Plus vibe-upselling of vibe-integrations for whatever Wonderful Engineering the customer has with your profit centres.

oliver236 · 2026-04-15T09:52:05 1776246725

AI demos are 5% of the product

stogot · 2026-04-15T11:24:40 1776252280

I’ve seen AI demos that are 100% the product, and the “company” won’t renew the $10 domain at the end of the year

appplication · 2026-04-16T02:16:47 1776305807

No offense but that is a wild statement

cmrdporcupine · 2026-04-15T08:38:58 1776242338

Right, and the story now shifts to: What's your customer service & support model? How can you prove this is stable and that you can maintain it? Who is going to handle the pages in the middle of the night?

mlnj · 2026-04-15T09:22:45 1776244965

All those things are beyond the demo itself. Vibe-coded demos are just demos. There are stability, security and everything enterprise that still needs to be added to a demo to actually make it functional as a paid offering.

The hard problems still remain.

cmrdporcupine · 2026-04-15T15:06:52 1776265612

My point is that if I were an investor in the LLM-era I'd be shifting my attention to the answers to those questions more than I would to tech demo.

user_7832 · 2026-04-15T09:19:27 1776244767

> The information value of providing (or receiving) a demo has dropped to roughly zero with vibe coding.

Only if you're a software-only startup. If you have hardware, the entire article is still valid.

bjornroberg · 2026-04-15T10:53:24 1776250404

The artifact can be faked cheaply now, so the only buying signal left is commitment. That's exactly the "ruthless" move the post argues for, I think.

eiiejr · 2026-04-15T15:00:07 1776265207

Vision and strategy are more important than ever.

And frankly visually being able to explain how your product beats the competition is more important than writing lines of code for a product that could be DOA.

However not everyone can do this. So the scientific approach gets pushed.

piker · 2026-04-13T11:47:22 1776080842

> It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.

The other issue is that yes, perhaps most users only use 20% of the features, but each user uses a different 20% of the features in products like Word. Trust me, it's super hard to get it right even at the end-user level, let alone the enterprise level like you say.

bluGill · 2026-04-13T13:41:31 1776087691

There are at most 5% of the features of word that are common to everyone. Things like spell check everyone uses. Actually I suspect it is more like 0.1% of the features are common, and most people use about 0.3% of the features and power users get up to 5% of the features - but I don't have data, just a guess.

piker · 2026-04-13T13:43:40 1776087820

Yeah but 98% of Word features were buried in like 2004. They were added when it was a selling point to use unicorn and gnome icons as your table border in under 100mb of RAM. So we’re talking about 20% of the limited set of features that remain not just for backwards compatibility.

jimbokun · 2026-04-13T16:05:54 1776096354

And there's some company out there that has very important Word documents that will fail to open if you take away the unicorn and gnome icons table border feature.

piker · 2026-04-13T09:26:19 1776072379

Tritium, the legal IDE: https://tritium.legal

This month we're focused on:

- first-party, native DMS integration;

- provider-agnostic agentic workflows; and

- enterprise-grade redlining

But of more interest to this group is probably our blog! Our latest post is about Gary Kildall's blunder quibbling over an NDA redline with IBM who was looking to give its entire enterprise away: https://tritium.legal/blog/redlining.

piker · 2026-04-12T20:57:27 1776027447

Also offers an explanation for their recent move down the stack including silliness like writing Word add-ins.