> Second-order optimizers and natural gradient methods
Do second order optimizers help improve data efficiency? I assumed they’d help you get to the same minimum faster (but this is way outside my wheelhouse).
yes! typically the optimizer that trains faster also get better data efficiency. it maybe not be absolutely true, but that has been my observation so far. also see https://arxiv.org/pdf/2510.09378 for second-order methods.
Fundamentally I don't believe second-order methods get better data efficiency by itself, but changes to the optimizer can because the convergence behavior changes. ML theory lags behind the results in practice.
> Second-order optimizers and natural gradient methods
Do second order optimizers help improve data efficiency? I assumed they’d help you get to the same minimum faster (but this is way outside my wheelhouse).