> Directions we think are wide open > Second-order optimizers and natural gradie...

sdpmas · 2026-03-05T00:35:53 1772670953

yes! typically the optimizer that trains faster also get better data efficiency. it maybe not be absolutely true, but that has been my observation so far. also see https://arxiv.org/pdf/2510.09378 for second-order methods.

vladf · 2026-03-05T02:33:19 1772677999

That still looks like a “converge faster” paper.

https://arxiv.org/abs/2006.10732

The above provides a nuanced theoretical view. GD inductive bias is probably better unless your model is misspecified

alyxya · 2026-03-05T01:21:46 1772673706

Fundamentally I don't believe second-order methods get better data efficiency by itself, but changes to the optimizer can because the convergence behavior changes. ML theory lags behind the results in practice.