Hacker Newsnew | past | comments | ask | show | jobs | submit | pugio's commentslogin

Thanks, this might be exactly what I'm looking for.

I see you have support for vanilla js and svelte, but it's unclear whether you can get all the same functionality if you don't use React. Is React the only first class citizen in this stack?


Thank you.

> Is React the only first class citizen in this stack?

Each system gets the same functionality. We centralize the critical logic for the client SDK in "@instantdb/core". React, Svelte, Tanstack, React Native et al are wrappers around that core library.

The one place where it's lacking a bit is the docs. We have specific docs for each library, but a lot of other examples assume React.

We are improving this as we speak. For now, the assumption on React is quite light in the docs, so it's relatively straightforward to figure out what needs to happen for the library of your choice.


Any thoughts on a potential Tanstack DB integration?

I love tanstack's rich front end. Ya'll have quite an amazing system, and I'm wondering if there's any thoughts on how perhaps your pretty substantial front end side might be adapted to tanstack DB. https://github.com/TanStack/db/tree/main/packages


Thanks, this helped crystallize something for me: the play the AI labs are making is anti-fragile (in the Nassim Taleb sense):

> The very act of resisting feeds what you resist and makes it less fragile to future resistance.

At least along certain dimensions. I don't think the labs themselves are antifragile. Obviously we all know the labs are training on everything (so write/act the way you want future AIs to perceive you), but I hadn't really focused on how they're absorbing the innovation that they stimulate. There's probably a biological analog...

Well there are many, and I quote this AI response here for its chilling parallels:

> Parasitic castrators and host manipulators do something related. Some parasites redirect a host’s resources away from reproduction and into body maintenance or altered tissue states that benefit the parasite. A classic example is parasites that make hosts effectively become growth/support machines for the parasite. It is not always “stimulate more tissue, then eat it,” but it is “stimulate more usable host productivity, then exploit it.” (ChatGPT 5.4 Thinking. Emphasis mine.)


Instead of anti-fragility, I'd point you to the law of requisite variety instead. You'll notice that all AI improvements are insanely good for a week or two after launch. Then you'll see people stating that 'models got worse'. What happened in fact is that people adapted to the tool, but the tool didn't adapt anymore. We're using AI as variety resistant and adaptable tools, but we miss the fact that most deployments nowadays do not adapt back to you as fast.

New models literally do get worse after launch, due to optimization. If you charted performance over time, it'd look like a sawtooth, with a regular performance drop during each optimization period.

That's the dirty secret with all of this stuff: "state of the art" models are unprofitable due to high cost of inference before optimization. After optimization they still perform okay, but way below SOTA. It's like a knife that's been sharpened until razor sharp, then dulled shortly after.


> If you charted performance over time, it'd look like a sawtooth

People have, though, and it doesn't show that. I think it's more people getting hit by the placebo effect, the novelty effect, followed by the models by-definition non-determinism leading people to say things like "the model got worse".


Is this insider info? The 'charted performance' caught my eye instantly. Couple things I find odd tho: why sawtooth? it would likely be square waves, as I'd imagine they roll down the cost-saving version quite fast per cohort. Also, aren't they unprofitable either way? Why would they do it for 'profitability'?

It's rumors based on vibes. There are attempts to track and quantify this with repeated model evaluations multiple times per day, this but no sawtooth pattern has emerged as far as I know.

I don't want to go too far down the conspiracy rabbit hole, but the vendors know everyone's prompts so it would be trivial for them to track the trackers and spoof the results. We already know that they substitute different models as a cost-saving measure, so substituting models to fool the repeated evaluations would be trivial.

We also already know that they actively seek out viral examples of poor performance on certain prompts (e.g. counting Rs in strawberry) and then monkey-patch them out with targeted training. How can we be sure they're not trying to spoof researchers who are tracking model performance? Heck, they might as well just call it "regression testing."

If their whole gig is an "emperor's new clothes" bubble situation, then we can expect them to try to uphold the masquerade as long as possible.


If a claim is unfalsifiable, it contains no information

What I said is quite far from unfalsifiable. Any number of insiders could step forward and set the record straight.

Not really. Only in the affirmative. If an insider said they don't do that, you'd still think they might do that.

It's not insider info, it's common knowledge in the industry (Google model optimization). I think they are unprofitable either way, but unoptimized models burn runway a lot faster than optimized ones.

The reason it's not a square wave is because new optimization techniques are always in development, so you can't apply everything immediately after training the new model. I also think there's a marketing reason: if the performance of a brand new model declines rapidly after release then people are going to notice much more readily than with a gradual decline. The gradual decline is thus engineered by applying different optimizations gradually.

It also has the side benefit that the future next-gen model may be compared favourably with the current-gen optimized (degraded) model, setting up a rigged benchmark. If no one has access to the original pre-optimized current-gen model, no one can perform the "proper" comparison to be able to gauge the actual performance improvement.

Lastly, I would point out that vendors like OpenAI are already known to substitute previous-gen models if they determine your prompt is "simple." You should also count this as a (rather crude) optimization technique because it's going to degrade performance any time your prompt is falsely flagged as simple (false positive).


You have a point but current LLM architectures in particular are very fragile to data poisoning [1,2].

[1] https://www.anthropic.com/research/small-samples-poison

[2] https://arxiv.org/abs/2510.07192


Yes, there are quite a few anti-AI projects. https://old.reddit.com/r/badphilosophy/wiki/index

No idea why you're being downvoted. We can't yet even demonstrate that LLMs will withstand training on their own output as they pollute the Internet.

Thanks for that. And here I was somehow hanging around on 4.5.3.


I've never yet been "that guy" on HN but... the title seems misleading. The actual title is "A Ramsey-style Problem on Hypergraphs" and a more descriptive title would be "All latest frontier models can solve a frontier math open problem". (It wasn't just GPT 5.4)

Super cool, of course.


Lately my favorite podcast to listen to has been the audio version of Zvi's blog: https://dwatvpodcast.substack.com/p/claude-code-claude-cowor... .

It's AI narrated, but at this point if I heard Zvi's actual voice I think I would be confused. It's really well done, and uses different voices for each new person being quoted. It also has really good narrated image descriptions.

Zvi's articles are literally exhaustively long,l - before I was able to listen to them I got tired trying to read the whole thing. Now it's my favorite way to keep up with AI.


I've been thinking about sovereign AI a lot lately. About a year ago I was wondering what each country would be doing, and looking at places like e.g. Australia (which has pretty strict data residency laws for certain industries) - at that point I thought about advocating for why such countries should train their own models, but now I'm having a harder time justifying that point.

I can't see how any of these other countries could even approach the level of capability of the big three providers. I can imagine only a handful of countries who could even theoretically put enough resources towards reaching the SOTA frontier. Sure, even a model of capability level ~2024 has plenty of valid use cases today, but I'm concerned that people will just go with the big three because what they offer is still so so much better.

Not trying to discourage efforts like these, but is there really a good case for working on them? Or perhaps there's a state/national case, but it's harder for me to see a real business case.


India has a lot of languages and people need access to something than allows them to do basic stuff with it. I don't think relying on the US is a long term solution.

An example. I am into proofreading and language learning and am forced to rely on Claude/Gemini to extract text from old books because of the lack of good Indian models. I started with regular Tesseract, but its accuracy outside of the Latin alphabet is not that great. Qwen 3/3.5 is good with the Bombay style of Devanagari but craps the bed with the Calcutta style. And neither are great with languages like Bengali. In contrast, Claude can extract Bengali text from terrible scans and old printing with something like 99+ percent accuracy.

Models specifically targeted at Indian languages and content will perform better within that context, I feel.


Seems like you and the author are doing the same thing: speaking in absolutes. It's possible for "Anthropic" (or the summed vector of all the human decision makers within it) to have contracted with the military because it wants to make money AND it wants to help.

The questions are: "Help with what, precisely?" and "How much money versus how much value (/principles) compromise?"


I've worked for big corporations for a long time, and one of the first things I've learned is that individual motivations mean very little, if anything. At the end of the day, the bottom line is all that matters. And we know this is particularly true of big LLM companies given their track records.


Unfortunately it still fails my personal SVG benchmark (educational 2d cross section of the human heart), even after multiple iterations and screenshots feedback. Oh well, back to the (human) drawing board.


It sounds really cool, but I don't see any way of trying the model directly. I don't actually want a "Persona" or "Replica" - I just want to use the sparrow-one model. Is there any way to just make API calls to that model directly?


Do you have anything written up about how you're doing this? Curious to learn more...


I don't but I should open source this code. I was trying to sell to OEM though, that's why. Are you interested in licensing it?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: