I read the paper yesterday, would recommend it. Kudos to the authors for getting...

kombine · 2025-03-26T14:58:01 1743001081

Give it another year and we will have a more specialised architecture tailored to 3D that will reach similar accuracy. VGGT is a ground-breaking research but it is in a way brute-force. There is plenty of work to do to make it more efficient.

dleeftink · 2025-03-25T15:27:06 1742916426

Doesn't the bitter lesson take the argument a bit too far by opposing search/learn to heuristics? Is the former not dependent on breakthroughs in the latter?

CooCooCaCha · 2025-03-25T17:15:31 1742922931

The bitter lesson is the opposite. It argues that hand-crafted heuristics will eventually get beaten by more general learning algorithms that can take advantage of computing power.

porphyra · 2025-03-25T17:31:42 1742923902

Indeed, even in "classical chess engines" like Stockfish which previously required handcrafted heuristics at leaf nodes, in recent years the NNUE [1] [2] has greatly outperformed it. Note that this is a completely different approach from the one that AlphaZero takes, and modern Stockfish is significantly stronger than AlphaZero.

[1] https://stockfishchess.org/blog/2020/introducing-nnue-evalua...

[2] https://www.chessprogramming.org/Stockfish_NNUE

dleeftink · 2025-03-25T18:31:00 1742927460

> eventually get beaten

Brute forcing is bound to find paths beyond heuristics. What I'm getting at is that the path needs to be established first before it can be beaten. Hence why I'm wondering if one isn't an extension of the other instead of an opposing strategy.

I.e. search and heuristics both have a time and place, not so much a bitter lesson but a common filter for a next iteration to pass through.

CooCooCaCha · 2025-03-25T23:10:30 1742944230

That's like saying horse drawn carriages aren't opposed to cars because they needed to be developed first.

SJC_Hacker · 2025-03-25T23:06:00 1742943960

> They used a giant bunch of [data], a year and a half of GPU time to [train] the final model,

>[train]: "The training runs on 64 A100 GPUs over nine days", that would be around $18k on lambda labs in case you're wondering

How is that a "year and half of GPU time". Maybe on some exoplanet ?

dragonwriter · 2025-03-25T23:11:06 1742944266

> > [train]: "The training runs on 64 A100 GPUs over nine days",

> How is that a "year and half of GPU time".

64 GPUs × 9 days = 576 GPU-days ≈ 1.577 GPU-years

refulgentis · 2025-03-26T01:10:47 1742951447

Doh, that's entirely fair: haven't been in this thread yet, but would echo what I perceive as implicit puzzlement re: this amount of GPU time being described as bitter-lesson-y.