At this point I'm comfortable in putting transformers as one of the top three de...

visarga · on April 11, 2022

This paper shows we can combine models like lego bricks even without end-to-end training using language as intermediate representation. That means more flexibility in training the models, each on its own dataset, and more ways they can be combined in. By getting rid of fine-tuning the models may retain their robustness to distribution shifts.

robbedpeter · on April 11, 2022

I think of the logic directing input through various different models and recursion in multiple pass inference as similar to executive function in a brain. Models are getting better and smaller, to the point you can use things like gpt-neo 125m with under 2gb vram, and you can run many-shot prompts to achieve much higher quality results than zero-shot output from larger models.

Gpt-3 type models are very good at selecting for arbitrary qualities from among a list of options. Generating a list of 10 potential answers, then running prompts on the candidates to select for quality, accuracy, style, and so forth resembles the cyclic formulation of ideas in humans. The process used to generate essays and articles - draft, edit, revise, simplify, repeat until satisfied - can be implemented trivially. Those processes will transfer to larger models, and things like RETRO reduce resources by orders of magnitude.

Cognitive architecture seems to be an accurate descriptor of the use of multiple models and the logic layers for many-shot, many model development.

It may not be human level with zero-shot output, but how many humans produce human-level output in their stream-of-consciousness output? The act of consideration, recursing over an idea and refining it, is achievable with these models in a way that humans can debug and tweak cycle to cycle.

Multipass "consideration" and revision methodologies can capture almost any meta-cognitive processes used by humans, whether it's Socratic method or the AP style guide or an arbitrary jumble of rules derived from 4chan posters.