>gives often the wrong intuition when thinking about applying gradient descent o...

euW3EeBe · on May 7, 2024

Not an expert in this field, but I'm guessing this is related to unintuitive nature of geometry in high-dimensional spaces.

One rough example I can think of is the fact that the number of ways to move away from the origin increases exponentially (or even faster?) as the dimensionality goes up. There is _way_ more volume away from the origin than near it, I've seen this explained as something like "most of the volume of a high-dimensional orange is in the peel". One result of this is the fact that samples of a standard gaussian end up forming a "shell" as opposed to a "ball" that you would expect (this phenomenon is called the concentration of measure in general).

Also, very roughly, high-dimensional objects have lots of corners, and these corners are also very sharp. I would guess that gradient descent would get stuck in these corners and have a hard time getting out.

Some links related to this:

- Spikey Spheres: http://www.penzba.co.uk/cgi-bin/PvsNP.py?SpikeySpheres#HN2

- Thinking outside the 10-dimensional box - 3blue1brown: https://www.youtube.com/watch?v=zwAD6dRSVyI

- This is a fairly long talk about HMC, but it does talk about some problems that come up when sampling high-dimensional distributions: https://www.youtube.com/watch?v=pHsuIaPbNbY

DeathArrow · on May 8, 2024

Probably there are very few local minimums since the chances of all local derivatives to be 0 decrease with the number of dimensions.