Not just AWS, looks like Anthropic uses it heavily as well. I assume they get plenty of handholding from Amazon though. I'm surprised any cloud provider does not invest drastically more into their SDK and tooling, nobody will use your cloud if they literally cannot.
Well AWS says Anthropic uses it but Anthropic isn’t exactly jumping up and down telling everyone how awesome it is, which tells you everything you need to know.
If Anthropic walked out on stage today and said how amazing it was and how they’re using it the announcement would have a lot more weight. Instead… crickets from Anthropic in the keynote
AWS has built 20 data centers in Indiana full of half a million Trainium chips explicitly for Anthropic. Anthropic is using them heavily. The same press announcement that Anthropic has made about Google TPUs is the exact same one they made a year ago about Trainium. Hell, even in the Google TPU press release they explicitly mention how they are still using Trainium as well.
I met a AWS engineer a couple of weeks ago and he said Trainium is actually being used for Anthropic model inference, not for training. Inferentia is basically defected Trainiums chips that nobody wants to use.
With GCP announcing they built Gemini 3 on TPUs the opposite is true. Anthropic is under pressure to show they don’t need expensive GPUs. They’d be catching up at this point, not leaking some secret sauce. No reason for them to not boast on stage today unless there’s nothing to boast about.
Anthropic is not going to interrupt their competitors if their competitors don't want to use trainium. Neither would you, I, nor anyone else. The only potential is downside. There's no upside potential for them at all in doing so.
From Anthropic's perspective, if the rest of us can't figure out how to make trainium work? Good.
Amazon will fix the difficulty problem with time, but that's time Anthropic can use to press their advantages and entrench themselves in the market.
> I'm surprised any cloud provider does not invest drastically more into their SDK and tooling
I used to work for an AI startup. This is where Nvidia's moat is - the tens of thousands of man-hours that has gone into making the entire AI ecosystem work well with Nvidia hardware and not much else.
It's not that they haven't thought of this, it's just that they don't want to hire another 1k engineers to do it.
>I'm surprised any cloud provider does not invest drastically more into their SDK and tooling, nobody will use your cloud if they literally cannot.
Building an efficient compiler from high-level ML code to a TPU is actually quite a difficult software engineering feat, and it's not clear that Amazon has the kind of engineering talent needed to build something like that. Not like Google which have developed multiple compilers and language runtimes.