> But that isn’t how language models work. LLMs model the distribution of words and phrases in a language as tokens. Their responses are nothing more than a statistically likely continuation of the prompt.
Not saying the author is wrong in general, but this kind of argument always annoys me. It's effectively a Forer statement for the "sceptics" side: It appears like a full-on refutation, but really says very little. It also evokes certain associations which are plain incorrect.
LLMs are functions that return a probability distribution of the next word given the previous words; this distribution is derived from the training data. That much is true. But this does not tell anything about how the derivation and probability generation processes actually work or how simple or complex they are.
What it does however, is evoke two implicit assumptions without justifying them:
1) LLMs fundamentally cannot have humanlike intelligence, because humans are qualitatively different: An LLM is a mathematical model and a human is, well, a human.
Sounds reasonable until you have a look at the human brain and find that human consciousness and thought too could be represented as nothing more than interactions between neurons. At which point, it gets metaphysical...
2) It implies that because LLMs are "statistical models", they are essentially slightly improved Markov chains. So if an LLM predicts the next word, it would essentially just look up where the previous words appeared in its training data most often and then return the next word from there.
That's not how LLMs work at all. For starters, the most extensive Markov chains have a context length of 3 or 4 words, while LLMs have a context length of many thousand words. Your required amounts of training data would go to "number of atoms in the universe" territory if you wanted to create a Markov chain with comparable context length.
Secondly, as current LLMs are based on the mathematical abstraction of neural networks, the relationship between training data and the eventual model weights/parameters isn't even fully deterministic: The weights are set to initial values based on some process that is independent of the training data - e.g. they are set to random values - and then incrementally adjusted so that the model can increasingly replicate the training data. This means that the "meaning" of individual weights and their relationship to the training data remains very unclear, and there is plenty of space in the model where higher-level "semantic" representations might evolve.
None of that is proof that LLMs have "intelligence", but I think it does show that the question can't be simply dismissed by saying that LLMs are statistical models.
Not saying the author is wrong in general, but this kind of argument always annoys me. It's effectively a Forer statement for the "sceptics" side: It appears like a full-on refutation, but really says very little. It also evokes certain associations which are plain incorrect.
LLMs are functions that return a probability distribution of the next word given the previous words; this distribution is derived from the training data. That much is true. But this does not tell anything about how the derivation and probability generation processes actually work or how simple or complex they are.
What it does however, is evoke two implicit assumptions without justifying them:
1) LLMs fundamentally cannot have humanlike intelligence, because humans are qualitatively different: An LLM is a mathematical model and a human is, well, a human.
Sounds reasonable until you have a look at the human brain and find that human consciousness and thought too could be represented as nothing more than interactions between neurons. At which point, it gets metaphysical...
2) It implies that because LLMs are "statistical models", they are essentially slightly improved Markov chains. So if an LLM predicts the next word, it would essentially just look up where the previous words appeared in its training data most often and then return the next word from there.
That's not how LLMs work at all. For starters, the most extensive Markov chains have a context length of 3 or 4 words, while LLMs have a context length of many thousand words. Your required amounts of training data would go to "number of atoms in the universe" territory if you wanted to create a Markov chain with comparable context length.
Secondly, as current LLMs are based on the mathematical abstraction of neural networks, the relationship between training data and the eventual model weights/parameters isn't even fully deterministic: The weights are set to initial values based on some process that is independent of the training data - e.g. they are set to random values - and then incrementally adjusted so that the model can increasingly replicate the training data. This means that the "meaning" of individual weights and their relationship to the training data remains very unclear, and there is plenty of space in the model where higher-level "semantic" representations might evolve.
None of that is proof that LLMs have "intelligence", but I think it does show that the question can't be simply dismissed by saying that LLMs are statistical models.