I guess the argument is that most AI research is supported by the big tech, and ...

kettleballroll · on Feb 25, 2024

As a member of the research community: that's nonsense. Like already pointed out: academic groups (who by no means are dependent on big tech) would jump all over that. Mamba has been out long enough that you'd already see tons of papers at arxiv showing mamba dominating transformers in all sorts of applications. But that's not happening, despite the ton of hype. That doesn't mean that mamba is nonsense. Just that it isn't the immediate transformer killer. It remains to be seen if something comes from it, eventually.

godelski · on Feb 25, 2024

As a member of the research community: that's nonsense. Publishing is an extremely noisy process in ML and is getting increasingly difficult for smaller non big tech collaborating labs. Reviewers' go to are: more datasets, scale, not novel. The easiest way to approach this is to work off of pretrained models. This is probably more obvious in the NLP world.

I agree that Mamba doesn't solve everything and it still needs work. But I disagree with the logic that there isn't an issue of railroading.

p1esk · on Feb 25, 2024

What’s the main difference between an ape’s brain and a human brain? Scale. So that’s the train we’re riding at the moment. No roadblocks yet, aside from cost.

godelski · on Feb 25, 2024

> What’s the main difference between an ape’s brain and a human brain? Scale.

This is incredibly naive with absolutely no scientific basis. There is no evidence that this is in scale of data nor scale of architecture.

There are a number of animals with larger brains in terms of both mass and total number of neurons. An African Elephant has roughly 3x the number of neurons humans have. Dolphins beat humans in total surface area. Neanderthals are estimated to have had larger brains too! It isn't mass, neurons, neuron density, surface area. We aren't just scaled up chimps.

p1esk · on Feb 25, 2024

Other animals with larger brains might have other bottlenecks preventing them from reaching full potential of their intelligence. Neanderthals might have been smarter than us, but went extinct for reasons not related to intelligence.

But my point stands - our brains have evolved directly from apes brains and the main difference between them and us is brain size.

godelski · on Feb 25, 2024

> Other animals with larger brains might have other bottlenecks

>>> What’s the main difference between an ape’s brain and a human brain? Scale.

Your argument is inconsistent. Very clearly everything isn't scale or we'd use other things besides transformers. Different architectures scale in different ways and everything has different inductive biases. No one doubts scale is important, but there's a lot more.

p1esk · on Feb 25, 2024

Scale is all we need for transformers (so far). It might also be all we need for ape brains. It’s not all we need for whatever elephant or dolphin brains evolved from.

When this stops being the case for transformers, we will need something else. I’m just pointing out it’s not the case yet.

godelski · on Feb 25, 2024

I see no evidence of this in biology nor in ML. I've read those scale papers. I've worked on scale myself. I'll bet the farm that scale isn't all you need. But I won't be surprised if people say that it is all scale.

If you really think it is all scale, train a 7T ResNet MLP based model for NLP. If scale is all you need, make a LLM without DPO or RLHF. If scale is all you need, make SD3 with a GAN. Or what about a VAE, Normalizing Flow, HMM? Do it with different optimizers. Do it with gradient free methods. Do it with different loss functions.

The bitter lesson wasn't "scale is all you need." That's just a misinterpretation.

Edit: It's fine to disagree. We can compete on the ideas and methods. That's good for our community. So continue down yours, and I'll continue down mine. All I ask is that since your camp is more popular, you don't block those on my side. If you're right, we'll get to AGI soon. If we're right, we still might. But if we're right and you all block us, we'll get another AI winter in between. If we're right and you all don't block us, we can progress without skipping a beat. Just don't put all your eggs in one basket. It isn't good for anyone.

p1esk · on Feb 25, 2024

I said “scale is all you need for transformers”. That has been true since GPT1. The best way to improve our best model today still seems to be “make it larger and train it on more data”.

If you disagree please suggest a better way, or at least provide evidence that scaling up no longer works for transformers.

algo_trader · on Feb 25, 2024

> at least provide evidence that scaling up no longer works for transformers.

Isnt the Mixture-of-Experts trend (GPT4 is MoE?) kinda of a proof ?

godelski · on Feb 26, 2024

Of scale? I would think not. I would say they are evidence against scale because they are more an argument for multi agent systems. Scale is about a singular framework. What that means is debatable though (I mean anything we call a singular network can be decomposed into sub networks. It's messy), hence the other part of my comment about not scale solutions being claimed as scale.

> I said “scale is all you need for transformers”

No you didn't. What kicked this all off was

> What’s the main difference between an ape’s brain and a human brain? Scale.

Don't retcon.

p1esk · on Feb 25, 2024

I think they went MoE purely because straight up scaling from 175B to 1.8T is just too expensive. But it’s still 10x scaling, right?

frozenseven · on Feb 26, 2024

Scary how someone can be so confident in their wrong information.

> An African Elephant has roughly 3x the number of neurons humans have.

An African elephant's brain is not a scaled up chimp brain in any way. African elephants have less cortical neurons than a chimp, and roughly a third of the amount that humans have.

> Dolphins beat humans in total surface area.

Animals even less related to humans and chimps, with no prehensile appendages, living in an environment where building stuff is exceedingly difficult. And of course their brains are obviously different from any great ape.

> Neanderthals are estimated to have had larger brains too!

And were just as smart as us and also had a scaled up chimp brain.

godelski · on Feb 26, 2024

animal :: cortical neurons (b) :: total neurons (b)

Human :: 16 :: 86

Gorilla :: 9.1 :: 33

Chimp :: 6 :: 22

African Elephant :: 5.6 :: 251

Chimps are generally considered more intelligent than gorillas.

Bottlenose Dolphins have 11-15b cortical neurons while humans are in the range 14-18 (range is measurement uncertainty). It's also worth noting these dolphins have a larger brain mass (1.6 kg) and larger cortical surface (3700 cm2) than humans (1.3 kg and 2400 cm2, respectively).

> with no prehensile appendages, living in...

So more than scale. Glad we agree. Seems you also agree that architecture matters too.

frozenseven · on Feb 26, 2024

> Chimps are generally considered more intelligent than gorillas.

And chimps are genetically more similar to humans than gorillas. A chimp brain is more similar to a human brain than a gorilla brain.

> So more than scale. Glad we agree.

We absolutely do not agree. Notice how nobody suggested that a human brain is a scaled up version of an axolotl's brain? Yeah, that didn't happen. I do wonder why?

algo_trader · on Feb 25, 2024

> Just that it isn't the immediate transformer killer.

What is the best/stable-ish linear alternative for transformer right now? Especially for text generation and summarization.

We have domain specific ways of over sampling and search, so we much prefer less expensive models.

anon291 · on Feb 26, 2024

As someone who's worked at several NVIDIA competitors, including Groq, I can guarantee you that, based on my knowledge, of existing products, they would be able to make much more money should they have lower memory footprint models. Given the amount of VC capital deployed for this (on the order of 100s of millions), I don't believe this is a reasonable take.

Sure, NVIDIA et al may not want that (although, again I don't see why... they too can't produce chips fast enough so being able to provide models for customers now ought to be good), but there's so much money out there that does...

landryraccoon · on Feb 25, 2024

Why would Meta, Microsoft, Amazon and Google want Nvidia to remain dominant in hardware? Are you treating “big tech” like they all have one hive mind?

behnamoh · on Feb 25, 2024

For MSFT, AMZN, GOOG, the competitive advantage comes from having huge datasets (that Nvidia doesn't have). It's a symbiosis that benefits the data-rich and GPU-rich players.

pixl97 · on Feb 25, 2024

This still makes little sense as that scale will always matter. If you can drop the compute cost of a model by 10x it means you can increase model integrity/intelligence/speed etc beyond what your compute bound competitors have.

Simply put, for the time being huge datasets are going to be needed and those with bigger (cleaner?) datasets will have a better behaving model.

fauigerzigerk · on Feb 25, 2024

Where is the symbiosis? If data is the differentiator, how do the data owners benefit from Nvidia eating into their margins?

coldtea · on Feb 26, 2024

It's sumbiosis based on a common factor they both appreciate:

- the data/processing having to be large means the data-owners have a benefit

- the data/processing having to be large means NVIDIA also has a benefit (sells more GPUs to handle all that load)

fauigerzigerk · on Feb 26, 2024

Data owners are benefitting from having access to data while others don't.

High processing cost is not a benefit to them at all. It's just a cost eating into their margins.