Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We are a very long way from AGI.

I don't think so, the scaling laws haven't failed so far. I fully expect that making the model bigger and training it on more data will make it better at logic.

For a nice example with image models, Scott Alexander made a bet that newer image models would be able to do the things that Dall-E 2 gets wrong. [1] (This post also discusses how GPT-3 could do many things that GPT-2 got wrong.) He won the bet three months later through Imagen access. [2]

[1]: https://astralcodexten.substack.com/p/my-bet-ai-size-solves-... [2]: https://astralcodexten.substack.com/p/i-won-my-three-year-ai...



I don’t know, isn’t the safer bet is that scaling will eventually reach a dead end? I honestly fail to see how a language model could “execute” a sequence of reasoning steps, as it doesn’t think in a symbolic way. Do correct me if I’m wrong but it would require a complex rearchitecture, so not sure we are any closer, we just have a very impressive, smart search engine now.


It’s not just a safe bet but almost guaranteed. Humans combine their internal language models with physical intuition and experimentation from the moment they are born. There is zero chance that an AI can understand the physical world without access to it [1]. Until it has that access, it’s no more than a glorified context specific Markov chain generator

[1] Henceforth called Kiselev’s conjecture, a corollary of Moravec’s paradox: https://en.m.wikipedia.org/wiki/Moravec's_paradox


It's possible for models to learn a lot about everyday physics from videos.


No, it isn't. Not yet.


You said "There is zero chance that an AI can understand the physical world without access to it," which is wrong. It is possible. Using videos is an active research area, e.g. https://proceedings.neurips.cc/paper/2021/hash/07845cd9aefa6... or https://arxiv.org/abs/2205.01314


Thank you for the links, it's fascinating!

Fact is, without a feedback loop that can run physical experiments like infants do from the moment they're born, I highly doubt they will develop a useful intuition using just video. Hence the conjecture


For text data, we probably don't have more than one more order of magnitude of data left.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: