Do current NLP systems understand arithmetic, and can the do it with unfamiliar numbers they've never seen on? If not, I'd think that your theory is demonstratably false, as a child can extrapolate mathematical axioms from just a few example problems, whereas NLP models are not able to do so.
>Do current NLP systems understand arithmetic, and can the do it with unfamiliar numbers they've never seen on?
I don't know if that question is rhetorical or not, but GPT-3 can do basic math for problems it has not been directly trained on, and there's been a fair amount of debate, including right here at hn, about what the takeaway is supposed to be.
GPT-3 can't do arithmetic very well at all. There is a big, fat, extraordinary claim that it can in the GPT-3 paper but it's only based on perfect accuracy on two-digit addition and subtraction, ~90% accuracy on three digit addition and subtraction and ... around 20% accuracy on addition and substraction between from three to five digits and multiplication from between two measly digits. Note: no division at all and no arithmetic with more than five digits. And very poor testing to ensure that the solved problems don't just happen to be in the model's training dataset to begin with, which is the simplest explanation of the reported results given that the arithmetic problems GPT-3 solves correctly are the ones that are most likely to be found in a coprus of natural language (i.e. two- and three- digit addition and subtraction).
tl;dr, GPT-3 can't do basic math for problems it has not been directly trained on.
See section 3.9.1 and Figure 3.10. There is an additional category of problems of combinations of addition, subtraction and multiplication between three single-digit numbers. Performance is poor.
This is incorrect. The paper shows that it can do basic math for problems it has not been directly trained on. The only way to dispute that is to twist that very simple claim into something it's not, as you do here by talking about all the extra math that it can't do. If I say I can cut carrots, it doesn't mean I'm claiming to be a world class chef. I am disappointed that a claim as simple as this can be completely misunderstood.
How ironic you claim that the paper overstates it, when you very carefully leave out every single qualifier about BPEs and how GPT-3's arithmetic improves massively when numbers are reformatted to avoid BPE problems. Pot, kettle.
Performing arithmetic per se is not impressive. A calculator can do it and the rules of arithmetic are not so complex that they can't be hand-coded, as they are, routinely. The extraordinary claim in the GPT-3 paper is that a language model is capable of performing arithmetic operations, rather than simply memorising their results [1]. Language models compute the probabilities of a token to follow from a sequence of tokens and in particular have no known ability to perform any arithmetic, so if GPT-3, which is a language model, were capable of doing something that it is not designed to do, then that would be very interesting indeed. Unfortunately, such an extraordinary claim is backed up with very poor evidence, and so amounts to nothing more than invoking magick.
__________
[1] From the paper linked above:
In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.
Now that I re-read this I'm struck by the extent to which the authors are willing to pull their results this way and that to force their preferred interpretation on them. Their model answers two-digit addition problems correctly? It's learned addition! Their model is making mistakes? It's because it's actually trying to compute the answer and failing! The much simpler explanation, that their model has memorised solutions to a few problems but there are many more it hasn't even seen, seems to be assigned a very, very low prior. Who cares about such a boring interpration? Language models are few shot learners! (once trained with a few billion examples that is).
The funny thing with GPT-3 is that it comes up with sentences that seem to make correct deductions using arithmetic or other submodels of reality but it will blithely generate further sentences that contradict these apparent understandings. It's impressive and incoherent all at the same time.
I question the premise here. Can young children (say <5) who don't know arithmetic at all learn the axioms from just a bunch of examples? This isn't how children are taught arithmetic; they are taught the rules/algorithm not just a bunch of data-points. Do you have a source on the claim?