There exists the Universal (Function) Approximation Theorem for neural networks ...

montebicyclelo · on July 1, 2024

People throw that proof around all the time; but all it does is show that a neural net is equivalent to a lookup table; and a lookup table with enough memory can approximate any function. It's miles away from explaining how real world, useful, neural nets, like conv-nets, transformers, LSTMs, etc. actually work.

jb1991 · on July 1, 2024

FYI, there are actually many algorithms going back longer than the neural network algorithm that have been proven to be a universal function approximator. Neural networks are certainly not the only and not the first to do so. There are quite a few that are actually much more appropriate for many cases than a neural network.

derangedHorse · on July 1, 2024

What other algorithms can do this and which situations would they be more useful than neural networks?

dontlikeyoueith · on July 2, 2024

The Taylor Series dates to 1715. Fourier Series dates to the 1820s.

Both are universal function approximators and both can be learned via gradient descent.

For the case where the function you want to learn actually is polynomial or periodic (respectively), these are better than neural networks.

camjw · on July 2, 2024

For your interest, Taylor Series are not universal function approximators - the Taylor Series around 0 for

f(x) = e^(-1/x^2) if x != 0 else 0

is identically zero (all partial derivatives are 0 at 0) but the function is clearly not identically zero. So the radius of convergence for this Taylor series is infinite but it only equals the approximated function at one point.

I'm sure there are some conditions you can put on f to make the Taylor Series a UFA but it's been quite a while since I did any real analysis so I have forgotten!

Doesn't detract from the overall point though that there are UFAs that are not neural nets. I should say that I don't know what the precise definition of a UFA really is, but I assume you have to have more than equality at one point.

Horffupolde · on July 2, 2024

Taylor series work on differentiable intervals. You specifically chose a function and interval where this is not true. Of course it will not be a good approximation.

dontlikeyoueith · on July 2, 2024

I'm pretty sure the UFA theorems for neural networks wouldn't apply to that function either: https://en.wikipedia.org/wiki/Universal_approximation_theore...

Generally, they assume the function to be approximated is continuous.

gnyeki · on July 1, 2024

This area is covered by non-parametric statistics more generally. There are many other methods to non-parametrically estimate functions (that satisfy some regularity conditions). Tree-based methods are one family of such methods, and the consensus still seems to be that they perform better than neural networks on tabular data. For example:

https://arxiv.org/abs/2106.03253

jb1991 · on July 1, 2024

https://www.quora.com/Is-there-any-universal-function-approx...

someoneontenet · on July 1, 2024

Newtons Method approximates square roots. Its useful if you want to approximate something like that without pulling in the computational power required of NN.

astrobe_ · on July 1, 2024

I think the problem to solve is more like : given a set of inputs and outputs, find a function that gives the expected output for each input [1]. This is like Newton's method on a higher order ;-). One can find such a tool in Squeak or Pharo Smalltalk, IIRC.

[1] https://stackoverflow.com/questions/1539286/create-a-functio...

jb1991 · on July 2, 2024

By definition, that’s not a “universal“ function approximator.

kristjansson · on July 1, 2024

Newton's method related to universal function approximation in the same way a natural oil seep is related to a modern IC engine...

richrichie · on July 1, 2024

Not any function though. There are restrictions on type of functions "universal" approximation theorem is applicable for. Interestingly, the theorem is about a single layer network. In practice, that does not work as well as having many layers.

visarga · on July 1, 2024

They can model only continuous functions, more specifically any continuous function on compact subsets of ℝⁿ. They can approximate functions to an arbitrary level of accuracy, given sufficient neurons

arketyp · on July 1, 2024

Makes you wonder what is meant by learning...

dekhn · on July 1, 2024

Learning is using observations to create/update a model that makes predictions which are more accurate than chance. At some point the model ends up having generalizability beyond the domain.