I guess the argument is that most AI research is supported by the big tech, and they have heavily invested in the deep learning approach.
If the fundings were funneled to research groups working on alternative approaches, maybe we'd see the same amount of progress in AI only using another approach.
As a member of the research community: that's nonsense. Like already pointed out: academic groups (who by no means are dependent on big tech) would jump all over that. Mamba has been out long enough that you'd already see tons of papers at arxiv showing mamba dominating transformers in all sorts of applications. But that's not happening, despite the ton of hype. That doesn't mean that mamba is nonsense. Just that it isn't the immediate transformer killer. It remains to be seen if something comes from it, eventually.
As a member of the research community: that's nonsense. Publishing is an extremely noisy process in ML and is getting increasingly difficult for smaller non big tech collaborating labs. Reviewers' go to are: more datasets, scale, not novel. The easiest way to approach this is to work off of pretrained models. This is probably more obvious in the NLP world.
I agree that Mamba doesn't solve everything and it still needs work. But I disagree with the logic that there isn't an issue of railroading.
What’s the main difference between an ape’s brain and a human brain? Scale. So that’s the train we’re riding at the moment. No roadblocks yet, aside from cost.
> What’s the main difference between an ape’s brain and a human brain? Scale.
This is incredibly naive with absolutely no scientific basis. There is no evidence that this is in scale of data nor scale of architecture.
There are a number of animals with larger brains in terms of both mass and total number of neurons. An African Elephant has roughly 3x the number of neurons humans have. Dolphins beat humans in total surface area. Neanderthals are estimated to have had larger brains too! It isn't mass, neurons, neuron density, surface area. We aren't just scaled up chimps.
Other animals with larger brains might have other bottlenecks preventing them from reaching full potential of their intelligence. Neanderthals might have been smarter than us, but went extinct for reasons not related to intelligence.
But my point stands - our brains have evolved directly from apes brains and the main difference between them and us is brain size.
> Other animals with larger brains might have other bottlenecks
>>> What’s the main difference between an ape’s brain and a human brain? Scale.
Your argument is inconsistent. Very clearly everything isn't scale or we'd use other things besides transformers. Different architectures scale in different ways and everything has different inductive biases. No one doubts scale is important, but there's a lot more.
Scale is all we need for transformers (so far). It might also be all we need for ape brains. It’s not all we need for whatever elephant or dolphin brains evolved from.
When this stops being the case for transformers, we will need something else. I’m just pointing out it’s not the case yet.
I see no evidence of this in biology nor in ML. I've read those scale papers. I've worked on scale myself. I'll bet the farm that scale isn't all you need. But I won't be surprised if people say that it is all scale.
If you really think it is all scale, train a 7T ResNet MLP based model for NLP. If scale is all you need, make a LLM without DPO or RLHF. If scale is all you need, make SD3 with a GAN. Or what about a VAE, Normalizing Flow, HMM? Do it with different optimizers. Do it with gradient free methods. Do it with different loss functions.
The bitter lesson wasn't "scale is all you need." That's just a misinterpretation.
Edit: It's fine to disagree. We can compete on the ideas and methods. That's good for our community. So continue down yours, and I'll continue down mine. All I ask is that since your camp is more popular, you don't block those on my side. If you're right, we'll get to AGI soon. If we're right, we still might. But if we're right and you all block us, we'll get another AI winter in between. If we're right and you all don't block us, we can progress without skipping a beat. Just don't put all your eggs in one basket. It isn't good for anyone.
I said “scale is all you need for transformers”. That has been true since GPT1. The best way to improve our best model today still seems to be “make it larger and train it on more data”.
If you disagree please suggest a better way, or at least provide evidence that scaling up no longer works for transformers.
Of scale? I would think not. I would say they are evidence against scale because they are more an argument for multi agent systems. Scale is about a singular framework. What that means is debatable though (I mean anything we call a singular network can be decomposed into sub networks. It's messy), hence the other part of my comment about not scale solutions being claimed as scale.
> I said “scale is all you need for transformers”
No you didn't. What kicked this all off was
> What’s the main difference between an ape’s brain and a human brain? Scale.
Scary how someone can be so confident in their wrong information.
> An African Elephant has roughly 3x the number of neurons humans have.
An African elephant's brain is not a scaled up chimp brain in any way. African elephants have less cortical neurons than a chimp, and roughly a third of the amount that humans have.
> Dolphins beat humans in total surface area.
Animals even less related to humans and chimps, with no prehensile appendages, living in an environment where building stuff is exceedingly difficult. And of course their brains are obviously different from any great ape.
> Neanderthals are estimated to have had larger brains too!
And were just as smart as us and also had a scaled up chimp brain.
animal :: cortical neurons (b) :: total neurons (b)
Human :: 16 :: 86
Gorilla :: 9.1 :: 33
Chimp :: 6 :: 22
African Elephant :: 5.6 :: 251
Chimps are generally considered more intelligent than gorillas.
Bottlenose Dolphins have 11-15b cortical neurons while humans are in the range 14-18 (range is measurement uncertainty). It's also worth noting these dolphins have a larger brain mass (1.6 kg) and larger cortical surface (3700 cm2) than humans (1.3 kg and 2400 cm2, respectively).
> with no prehensile appendages, living in...
So more than scale. Glad we agree. Seems you also agree that architecture matters too.
> Chimps are generally considered more intelligent than gorillas.
And chimps are genetically more similar to humans than gorillas. A chimp brain is more similar to a human brain than a gorilla brain.
> So more than scale. Glad we agree.
We absolutely do not agree. Notice how nobody suggested that a human brain is a scaled up version of an axolotl's brain? Yeah, that didn't happen. I do wonder why?
As someone who's worked at several NVIDIA competitors, including Groq, I can guarantee you that, based on my knowledge, of existing products, they would be able to make much more money should they have lower memory footprint models. Given the amount of VC capital deployed for this (on the order of 100s of millions), I don't believe this is a reasonable take.
Sure, NVIDIA et al may not want that (although, again I don't see why... they too can't produce chips fast enough so being able to provide models for customers now ought to be good), but there's so much money out there that does...
For MSFT, AMZN, GOOG, the competitive advantage comes from having huge datasets (that Nvidia doesn't have). It's a symbiosis that benefits the data-rich and GPU-rich players.
This still makes little sense as that scale will always matter. If you can drop the compute cost of a model by 10x it means you can increase model integrity/intelligence/speed etc beyond what your compute bound competitors have.
Simply put, for the time being huge datasets are going to be needed and those with bigger (cleaner?) datasets will have a better behaving model.
If the fundings were funneled to research groups working on alternative approaches, maybe we'd see the same amount of progress in AI only using another approach.