Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think I have seen an answer here that actually challenges this question - from my experience, I have yet to see a neural network actually learn representations outside the range in which it was trained. Some papers have tried to use things like sinusoidal activation functions that can force a neural network to fit a repeating function, but on its own I would call it pure coincidence.

On generalization - its still memorization. I think there has been some proof that chatgpt does 'try' to perform some higher level thinking but still has problems due to the dictionary type lookup table it uses. The higher level thinking or agi that people are excited about is a form of generalization that is so impressive we don't really think of it as memorization. But I actually question if our wantingness to generate original thought isn't as actually separate from what we currently are seeing.



> I have yet to see a neural network actually learn representations outside the range in which it was trained

Generalization doesn't require learning representations outside of the training set. It requires learning reusable representations that compose in ways that enable solving unseen problems.

> On generalization - its still memorization

Not sure what you mean by this. This statement sounds self contradictory to me. Generalization requires abstraction / compression. Not sure if that's what you mean by memorization.

Overparameterized models are able to generalize (and tend to, when trained appropriately) because there are far more parameterizations that minimize loss by compressing knowledge than there are parameterizations that minimize loss without compression.

This is fairly easy to see. Imagine a dataset and model such that the model has barely enough capacity to learn the dataset without compression. The only degrees of freedom would be through changes in basis. In contrast, if the model uses compression, that would increase the degrees of freedom. The more compression, the more degrees of freedom, and the more parameterizations that would minimize the loss.

If stochastic gradient descent is sufficiently equally as likely to find any given compressed minimum as any given uncompressed minimum, then the fact that there are exponentially many more compressed minimums than uncompressed minimums means it will tend to find a compressed minimum.

Of course this is only a probabilistic argument, and doesn't guarantee compression / generalization. And in fact we know that there are ways to train a model such that it will not generalize, such as training for many epochs on a small dataset without augmentation.


The issue is that we are prone to inflate the complexity of our own processing logic. Ultimately we are pattern recognition machines in combination with abstract representation. This allows us to connect the dots between events in the world and apply principles in one domain to another.

But, like all complexity, it is reduceable to component parts.

(In fact, we know this because we evolved to have this ability. )


Calling us "pattern recognition machines capable of abstract representation" I think is correct, but is (rather) broad description of what we can do and not really a comment on how our minds work. Sure, from personal observation, it seems like we sometimes overcomplicate self-analysis ("I'm feeling bad – why? oh, there are these other things that happened and related problems I have and maybe they're all manifestations of one or two deeper problems, &c" when in reality I'm just tired or hungry), but that seems like evidence we're both simpler than we think and also more complex than you'd expect (so much mental machinery for such straightforward problems!).

I read Language in Our Brain [1] recently and I was amazed by what we've learned about the neurologicial basis of language, but I was even more astounded at how profoundly little we know.

> But, like all complexity, it is reduceable to component parts.

This is just false, no? Sometimes horrendously complicated systems are made of simple parts that interact in ways that are intractable to predict or that defy reduction.

[1] https://mitpress.mit.edu/9780262036924/language-in-our-brain




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: