I think that should be true, but doesn't hold up in practice.
I work with a good editor from a respected political outlet. I've tried hard to get current models to match his style: filling the context with previous stories, classic style guides and endless references to Strunk & White. The LLM always ends up writing something filtered through tropes, so I inevitably have to edit quite heavily, before my editor takes another pass.
It feels like LLMs have a layperson's view of writing and editing. They believe it's about tweaking sentence structure or switching in a synonym, rather than thinking hard about what you want to say, and what is worth saying.
I also don't think LLMs' writing capabilities have improved much over the last year or so, whereas coding has come on leaps and bounds. Given that good writing is a matter of taste which is beyond the direct expertise of most AI researchers (unlike coding), I doubt they'll improve much in the near future.
Germany has an anonymous support programme for people who feel paedophilic urges but don't wish to offend. I believe they've used that network for research, but I think it's probably quite a limited, and potentially biased, sample.
Carbon offsets are a sham, but you could just require them to directly pay for the actual energy infrastructure required. If you need 1GW of electricity, develop 1GW of solar.
Sorry, I'm not picking up on the connection - could you expand? Do you think they should also pay for offsets alongside developing energy infrastructure?
I guess what I'm asking is how long it takes, soup-to-nuts, for the 1GW installation to be carbon neutral or better? I've read anywhere from 7 months to 25 years. Maybe its dependent on location?
Oh sure, I see what you mean - thanks for clarifying. On top of your point, it's true that CO2 has a prolonged impact on global temperature even after it's been 'removed' from the atmosphere, so even once solar pays back the original carbon investment its impact lingers for a while.
I guess at a certain point you're getting at a more fundamental question about the value of AI (plus technology and everything else) - what level of environmental tradeoff is acceptable? One thing I slightly lament about the discourse is that tradeoff is widely discussed in the case of AI, but not in the context of stuff we do. I suspect most people aren't aware that the water use associated with eating a burger dwarves a year of ChatGPT, that a long-haul flight wipes out the emissions savings of a couple years' veganism, or that renewables have their own impacts, like the demolition of Chile for copper.
Transformers do have a fixed input/output size though - that's what a context window is. It's just that, via scaling and algorithmic improvements, the length of usable context windows has increased to the point that they're much less of a bottleneck.
I think your points around parallelisation and the flexibility of quadratic attention are spot-on though.
This opens up an interesting new avenue for corporate FOMO. What if you don't partner with Anthropic, miss out on access to their shiny new cybersec model, and then fall prey to a vuln that the model would have caught?
Did that happen to a lot of companies during the log4shell fiasco? I'm sure some companies had their permissions misconfigured in a way such that a malicious actor who could execute code on their servers could also drop their database and delete their backups.
Great piece. And a good excuse to read up on the use of diaeresis in English (eg. coördination, reëlection) to distinguish repeated vowels - I hadn't seen the New Yorker's usage before.
I see Munroe's work as filling the same role in society as did Socrates in his time. Not only in commentary about current events, government, society, etc but also in expressing his viewpoints in a fashion accessible to society. Socrates paved the way to bring philosophy to the masses. Munroe uses a popular medium and comedy to the same effect.
The Gaussian Processes underpinning this work are hardly a product of the 'AI Hype Machine' - they've been around for decades, have strong statistical underpinnings, and are being widely explored for experimental design across many disciplines. Reflexive and poorly-informed backlash to any variety of machine learning is no more productive than blindly hyping up LLMs.
Meta Platforms, Inc featuring this technology with a title announcing “AI for American-produced cement and concrete” is, on the other hand, 1000% a product of the AI Hype Machine.
Sure, it's clearly marketing. I think a private company pursuing marketing via open research with open source code (including datasets) is a good trade. A hypey blogpost + research is better than no blogpost and no research.
A sidenote along these lines - I've recently done an MSc, and found that the default approach to lectures is now to present slide decks. One of the profs, however, delivers a more traditional lecture, writing everything on a blackboard. I've found the second style far more effective, largely because writing caps the rate at which information can be conveyed. Because slides have no such bottleneck, I've found they're often misused and overladen with information which is skipped over too quickly.
Do you have any evidence that inference revenue is growing faster than training costs? RLVR is significantly less compute-efficient than token-prediction pretraining - especially as labs are trying to train models to achieve agentic tasks which take tens of minutes per rollout.
It's definitely true that they've increased their revenue rapidly. But at the same time the 'scaling laws' that the labs were first built around require exponentially-scaling cost (10x flops for a fixed reduction in training loss).
If anything, a better look at the economics is a reason to look forward to one of them IPO-ing. I suspect the labs probably could cut R&D and turn a profit, but that might only work for one generation, until they get superseded by the competition.
There is no doubt that competition is what is driving unprofitability. So when people say AI can't be monetized, I laugh. Right now, foundational AI is unprofitable because of competition, not because they can't make money.
But this is exactly the problem - we have to take it on faith that inference is profitable because nobody actually knows. It’s hard to even define what that would mean, and while I am suspicious of claims that frontier lab CEOs are just out-and-out liars or bad people, defining and calculating the real cost of inference would be time- and labor-intensive in its own right and there is no strong incentive to do it other than “tech reporters are curious.” Until the IPO, we just won’t know.
A lot of people know. A lot of insiders have been saying tokens are profitable. Is there a conspiracy theory for everyone to lie? Including OpenAI, Anthropic CEOs, employees, Cursor management, inference providers of Chinese models?
Profitable on what basis? They generate more revenue than the cost of electricity? Does that factor in the cost to service the massive, multi-layer cake of debt that was necessary to even begin to serve inference in the first place - not from a training perspective but from a hardware and facilities perspective?
I’m not talking about training costs. I’m talking about startup costs. You have to pay for GPUs (or to rent data centers). You have to pay for the electricity that runs those data centers, and in a lot of cases these frontier labs are building the data centers on credit, so you need to pay for the construction, the materials, etc. If it was as simple as “running the GPUs costs less than we charge for it,” I might be inclined to agree. But the GPUs don’t just appear by magic.
Right now, the demand is far more than supply for GPUs. Every cloud company is saying they're leaving money on the table because they don't have enough compute to serve the demand.
It seems like you're arguing that the bubble is going to collapse soon, like the author? How can it collapse when the demand is so much bigger than supply? Do you think the demand is fake? Or that AI will stop making progress from here on out?
The demand is real. The tech is real. The economics are completely unsustainable. Switching costs and barriers to entry are too low, operating costs are too high. And if the tech improves, it actually makes it even easier for competitors to swoop in and take market share. Not long ago, an agent that was 80% as good as SOTA was not usable. A year from now, an agent that is 80% as good as SOTA will be better than the best agent is today. We have it on good authority that today’s agents are very good, very useful. Why bother paying full price?
This is deeply ironic in a way. Because the whole premise of AI labor replacement is that AI does not need to be better than human labor, it just needs to be cheaper with acceptable performance. But the same is true one step down: discount AI doesn’t need to be better than bleeding-edge AI, it just needs to be cheaper with acceptable performance.
I work with a good editor from a respected political outlet. I've tried hard to get current models to match his style: filling the context with previous stories, classic style guides and endless references to Strunk & White. The LLM always ends up writing something filtered through tropes, so I inevitably have to edit quite heavily, before my editor takes another pass.
It feels like LLMs have a layperson's view of writing and editing. They believe it's about tweaking sentence structure or switching in a synonym, rather than thinking hard about what you want to say, and what is worth saying.
I also don't think LLMs' writing capabilities have improved much over the last year or so, whereas coding has come on leaps and bounds. Given that good writing is a matter of taste which is beyond the direct expertise of most AI researchers (unlike coding), I doubt they'll improve much in the near future.
reply