Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At this point I'm comfortable in putting transformers as one of the top three developments in machine learning history. The way things are headed, they may turn out to be one of the most important "discoveries" ever made by humankind.

I'm extremely optimistic about how transformers can recursively speed up progress in multiple areas of science. Transformers are reaching a point where they can demonstrate reasoning abilities within the ballpark of what you might expect from a human. For certain qualities, they far exceed what any human is capable of. One of those areas being depth of knowledge. Transformers (e.g. RETRO) can incorporate a library of knowledge far larger than any human can. Soon we will improve and harness this ability to the point where it may be pointless to create a scientific hypothesis without first "consulting" a large language model that is able to process the entire library of scientific publications.



This paper shows we can combine models like lego bricks even without end-to-end training using language as intermediate representation. That means more flexibility in training the models, each on its own dataset, and more ways they can be combined in. By getting rid of fine-tuning the models may retain their robustness to distribution shifts.


I think of the logic directing input through various different models and recursion in multiple pass inference as similar to executive function in a brain. Models are getting better and smaller, to the point you can use things like gpt-neo 125m with under 2gb vram, and you can run many-shot prompts to achieve much higher quality results than zero-shot output from larger models.

Gpt-3 type models are very good at selecting for arbitrary qualities from among a list of options. Generating a list of 10 potential answers, then running prompts on the candidates to select for quality, accuracy, style, and so forth resembles the cyclic formulation of ideas in humans. The process used to generate essays and articles - draft, edit, revise, simplify, repeat until satisfied - can be implemented trivially. Those processes will transfer to larger models, and things like RETRO reduce resources by orders of magnitude.

Cognitive architecture seems to be an accurate descriptor of the use of multiple models and the logic layers for many-shot, many model development.

It may not be human level with zero-shot output, but how many humans produce human-level output in their stream-of-consciousness output? The act of consideration, recursing over an idea and refining it, is achievable with these models in a way that humans can debug and tweak cycle to cycle.

Multipass "consideration" and revision methodologies can capture almost any meta-cognitive processes used by humans, whether it's Socratic method or the AP style guide or an arbitrary jumble of rules derived from 4chan posters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: