There exists the Universal (Function) Approximation Theorem for neural networks — which states that they can represent/encode any function to a desired level of accuracy[0].
However there does not exist a theorem stating that those approximations can be learned (or how).
People throw that proof around all the time; but all it does is show that a neural net is equivalent to a lookup table; and a lookup table with enough memory can approximate any function. It's miles away from explaining how real world, useful, neural nets, like conv-nets, transformers, LSTMs, etc. actually work.
FYI, there are actually many algorithms going back longer than the neural network algorithm that have been proven to be a universal function approximator. Neural networks are certainly not the only and not the first to do so. There are quite a few that are actually much more appropriate for many cases than a neural network.
For your interest, Taylor Series are not universal function approximators - the Taylor Series around 0 for
f(x) = e^(-1/x^2) if x != 0 else 0
is identically zero (all partial derivatives are 0 at 0) but the function is clearly not identically zero. So the radius of convergence for this Taylor series is infinite but it only equals the approximated function at one point.
I'm sure there are some conditions you can put on f to make the Taylor Series a UFA but it's been quite a while since I did any real analysis so I have forgotten!
Doesn't detract from the overall point though that there are UFAs that are not neural nets. I should say that I don't know what the precise definition of a UFA really is, but I assume you have to have more than equality at one point.
Taylor series work on differentiable intervals. You specifically chose a function and interval where this is not true. Of course it will not be a good approximation.
This area is covered by non-parametric statistics more generally. There are many other methods to non-parametrically estimate functions (that satisfy some regularity conditions). Tree-based methods are one family of such methods, and the consensus still seems to be that they perform better than neural networks on tabular data. For example:
Newtons Method approximates square roots. Its useful if you want to approximate something like that without pulling in the computational power required of NN.
I think the problem to solve is more like : given a set of inputs and outputs, find a function that gives the expected output for each input [1]. This is like Newton's method on a higher order ;-). One can find such a tool in Squeak or Pharo Smalltalk, IIRC.
Not any function though. There are restrictions on type of functions "universal" approximation theorem is applicable for.
Interestingly, the theorem is about a single layer network. In practice, that does not work as well as having many layers.
They can model only continuous functions, more specifically any continuous function on compact subsets of ℝⁿ. They can approximate functions to an arbitrary level of accuracy, given sufficient neurons
Learning is using observations to create/update a model that makes predictions which are more accurate than chance. At some point the model ends up having generalizability beyond the domain.
However there does not exist a theorem stating that those approximations can be learned (or how).
[0] https://en.m.wikipedia.org/wiki/Universal_approximation_theo...