More

w-m · 2025-04-07T12:58:52 1744030732

It matches my anecdotal experience here (from Germany). When talking about travelling to the US, previously the discussions of the drawbacks would circle around climate impact, expense or flight time. Recently, this has shifted to feeling unwelcome or unsafe. I talked to two different people last week who had gone to the US regularly in the past years, who have decided to not go this year (attending a wedding and the other for touristic reasons), due to the current political climate.

w-m · 2025-03-25T15:21:08 1742916068

Certainly possible, read 4.6. Finetuning for Downstream Tasks in the paper, the first subsection is "Feed-forward Novel View Synthesis". They chose to report their experiments on LVSM, which is not an explicit representation like 3D Gaussian Splatting, but they're citing two feed-forward 3DGS approaches in their state of the art listing.

Should be quite exciting going forward, as fine-tuning might be possible on consumer hardware / single Desktop machines (like it is with LLMs). So I would expect a lot of experiments coming out in this space, soon-ish. If the results hold true, it'll be pretty exciting to drop slow and cumbersome COLMAP processing and scene optimization for a single forward pass that lasts a few seconds.

vessenes · 2025-03-26T20:51:08 1743022268

Totally agreed. An A100 is not expensive these days also, especially for finetuning.

w-m · 2025-03-25T15:04:04 1742915044

I read the paper yesterday, would recommend it. Kudos to the authors for getting to these results, and also for presenting them in a polished way. It's nice to follow the arguments about the alternating attention (global across all tokens vs only the tokens per camera), the normalization (normalize the scene scale - done in the data vs DUST3R, which normalizes in the network), and the tokens (image tokens from DINOv2 + camera tokens + additional register tokens, handling the first camera differently as it becomes the frame of reference). The results are amazing, and fine-tuning this model will be fun, e.g. for forward 3DGS reconstruction, looking forward to this.

I'm sure getting to this point was quite difficult, and on the project page you can read how it involved discussions with lots and lots of smart and capable people. But there's no big "aha" moment in the paper, so it feels like another hit for The Bitter Lesson in the end: They used a giant bunch of [data], a year and a half of GPU time to [train] the final model, and created a model with a billion parameters that outperforms all specialized previous models.

Or in the words of the authors, from the paper:

> We also show that it is unnecessary to design a special network for 3D reconstruction. Instead, VGGT is based on a fairly standard large transformer [119], with no particular 3D or other inductive biases (except for alternating between frame-wise and global attention), but trained on a large number of publicly available datasets with 3D annotations.

Fantastic to have this. But it feels.. yes, somewhat bitter.

[The Bitter Lesson]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html (often discussed on HN)

[data]: "Co3Dv2 [88], BlendMVS [146], DL3DV [69], MegaDepth [64], Kubric [41], WildRGB [135], ScanNet [18], HyperSim [89], Mapillary [71], Habitat [107], Replica [104], MVS-Synth [50], PointOdyssey [159], Virtual KITTI [7], Aria Synthetic Environments [82], Aria Digital Twin [82], and a synthetic dataset of artist-created assets similar to Objaverse [20]."

[train]: "The training runs on 64 A100 GPUs over nine days", that would be around $18k on lambda labs in case you're wondering

kombine · 2025-03-26T14:58:01 1743001081

Give it another year and we will have a more specialised architecture tailored to 3D that will reach similar accuracy. VGGT is a ground-breaking research but it is in a way brute-force. There is plenty of work to do to make it more efficient.

dleeftink · 2025-03-25T15:27:06 1742916426

Doesn't the bitter lesson take the argument a bit too far by opposing search/learn to heuristics? Is the former not dependent on breakthroughs in the latter?

CooCooCaCha · 2025-03-25T17:15:31 1742922931

The bitter lesson is the opposite. It argues that hand-crafted heuristics will eventually get beaten by more general learning algorithms that can take advantage of computing power.

porphyra · 2025-03-25T17:31:42 1742923902

Indeed, even in "classical chess engines" like Stockfish which previously required handcrafted heuristics at leaf nodes, in recent years the NNUE [1] [2] has greatly outperformed it. Note that this is a completely different approach from the one that AlphaZero takes, and modern Stockfish is significantly stronger than AlphaZero.

[1] https://stockfishchess.org/blog/2020/introducing-nnue-evalua...

[2] https://www.chessprogramming.org/Stockfish_NNUE

dleeftink · 2025-03-25T18:31:00 1742927460

> eventually get beaten

Brute forcing is bound to find paths beyond heuristics. What I'm getting at is that the path needs to be established first before it can be beaten. Hence why I'm wondering if one isn't an extension of the other instead of an opposing strategy.

I.e. search and heuristics both have a time and place, not so much a bitter lesson but a common filter for a next iteration to pass through.

CooCooCaCha · 2025-03-25T23:10:30 1742944230

That's like saying horse drawn carriages aren't opposed to cars because they needed to be developed first.

SJC_Hacker · 2025-03-25T23:06:00 1742943960

> They used a giant bunch of [data], a year and a half of GPU time to [train] the final model,

>[train]: "The training runs on 64 A100 GPUs over nine days", that would be around $18k on lambda labs in case you're wondering

How is that a "year and half of GPU time". Maybe on some exoplanet ?

dragonwriter · 2025-03-25T23:11:06 1742944266

> > [train]: "The training runs on 64 A100 GPUs over nine days",

> How is that a "year and half of GPU time".

64 GPUs × 9 days = 576 GPU-days ≈ 1.577 GPU-years

refulgentis · 2025-03-26T01:10:47 1742951447

Doh, that's entirely fair: haven't been in this thread yet, but would echo what I perceive as implicit puzzlement re: this amount of GPU time being described as bitter-lesson-y.

w-m · 2025-02-11T23:57:12 1739318232

Sure they can. o3-mini can do web searches, which puts it far ahead of o1 if you require current information. You can also tell it to go read a particular paper from just the rough name.

w-m · 2025-01-31T10:48:25 1738320505

If you have things organized neatly together, you can also use pre-existing compression algorithms, like JPEG, to compress your data. That's what we're doing in Self-Organizing Gaussians [0]. There we take an unorganised (noisy) set of primitives that have 59 attributes and sort them into 59 2D grids which are locally smooth. Then we use off-the-shelf image formats to store the attributes. It's an incredibly effective compression scheme, and quite simple.

[0]: https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/

w-m · 2025-01-26T23:43:11 1737934991

So they want to build a hollow structure, 1 km high, with all its weight concentrated at the very top (when charged)? How is that supposed to not immediately collapse?

GuB-42 · 2025-01-27T00:19:21 1737937161

It is possible if the weights are not too heavy. Does it make economic sense? Probably not.

Etherlord87 · 2025-01-27T11:49:43 1737978583

It probably makes economic sense considering the marketing.

inhumantsar · 2025-01-27T00:53:52 1737939232

it's not a hollow structure, it's space set aside in an otherwise normal commercial/residential tower. like extra banks of elevators. even if it weren't though, the weight of the building would far outweigh the weight of the blocks.

w-m · 2024-12-18T16:03:04 1734537784

These are the slides for the talk "The Aging Programmer" by Kate Gregory at CppNorth. Here's the recording of the talk: https://www.youtube.com/watch?v=LArOT95LTJU&list=PLsAtvvJ8KX...

w-m · on Dec 3, 2024

Absolute pitch: a completely useless skill, which having can in some cases even be detrimental. While being very hard to impossible to acquire. So naturally I will stop at nothing trying to develop it :)

A couple of months ago, this paper made the rounds: Absolute pitch in involuntary musical imagery [0]. In a small sample group, nearly half the time (44.7%) when someone was asked to sing their current earworm, were they perfectly in pitch. Random chance would be 8.3%.

It’s a fun thing to try for yourself. Just hum your current earworm into a voice memo, and check the correct pitch against the recording of the original song. You may discover a skill you never knew you had, implicit perfect pitch on involuntary music!

Trying to make this more interesting, reproducing a particular song on demand (there’s references to that too in the paper - it also works better than random chance, but less so than the involuntary kind), I find it works best for songs that start off with a single note, preferably sung. Or then at least you can immediately check whether you were right, e.g. “Tom’s Diner”. I’ve been having a lot of fun humming the first tone to Laufey’s cover of Sunny side of the street [1] whenever I open YouTube. I’m more often right than wrong, and if I was wrong, I can just listen to the whole thing to brighten my day anyways.

[0]: https://link.springer.com/article/10.3758/s13414-024-02936-0

[1]: https://youtu.be/wK6gbKC90Ps

jsphweid · on Dec 3, 2024

This post isn't compelling to me.

> a completely useless skill

It's probably an overrated skill depending on the musical task, but to say it's completely useless is really ignorant. Nearly anyone who studies music at the university level or above would find this statement ("completely useless") to be wildly incorrect.

> Random chance would be 8.3%.

A random human won't sing off the cuff with their tonal center magically quantized to one of the twelve keys in our modern western tuning (Equal Temperament).

w-m · on Dec 3, 2024

This criticism isn't compelling to me.

The smiling emoji at the end of the first paragraph indicates that these statements were made somewhat in jest, or perhaps exaggerated. Of course some uses can be found for absolute pitch. I saw one a couple weeks back, when Jacob Collier was tuning the audience choir to lead into "Somebody to Love" played on the piano. But, hadn't he had absolute pitch, he might just have picked up a reference note from the piano or his in-ear monitors, like a filthy commoner. Usually when making music, having good relative pitch is required, and a reference instrument is mostly handy, making perfect pitch somewhat redundant. But do tell what you're doing with perfect pitch, I'm curious.

And on curiosity, I went to your website and randomly listened to "The Fugue Song" [0]. Really loved it! Very nice moment when the singing comes in, repeating the phrase from the fugy guitar intro. Good song! (I'm a total sucker for Nina Simone's "Love Me Or Leave me", do you know that? A song where she's inserting some counterpoint improvisations in the middle). I'm listening to a bit of "Hiss" now.

> 8.3%

Rounded to the next semitone of course, I left that detail out, it's in the paper.

[0]: https://josephweidinger.com/project/the_fugue_song/

whstl · on Dec 3, 2024

It's just an anecdote, but: I remember playing in a band with an amateur musician friend who told me had perfect pitch, and it was "very annoying", according to him, when we transposed songs to fit our voices. They just "sounded wrong" to him. He would make beginner mistakes since he relied on pitch memory rather than listening to us to know what to play/sing.

I have no idea how it works in general, but it seems like it was a problem for at least this one guy.

jsphweid · on Dec 3, 2024

xD Well I didn't expect it to go there. I don't know that song but I'll have to listen to it when I get home.

I don't have perfect pitch but it's not too hard to imagine what I could do with it. For me, it'd simply make a lot of tasks faster. I've spent a lot of time throughout my life transcribing music. I can do it relatively quickly but there's lots of moments where I have to confirm things, or poke around notes finding the matching notes, struggling to clarify shades of a chord based on the presence of notes.

Reading music will be easier for people with AP, especially in singing situations. Even if you're a pianist it will still be helpful. There are a few Marc-Andre Hamelin interviews out there where he describes some of the advantages. It's easier to read music if you know immediately what it's going to sound like. Again, this is possible with relative pitch, but it's just more work and slower.

Arranging and composing away from the keyboard will be much easier with perfect pitch.

As time goes on, it'll be less important, most likely. And yes, there are some downsides obviously. In my final aural training class in music school, we had a competition at the very end of the year for fun. It came down to a team of 3 I was on vs. a team of 3 that had a guy with AP. The final task was to sight-sing a musical 'round' (a composition where the melody repeats in the various voices at different points in different voices overlapping each other). The guy with AP actually ruined it for his team. They mistakenly chose him to finish the round instead of start it. Mid-way through their performance, the pitch on their team had drifted so heavily they were in-between notes on the piano when he took over. He tried to sing 'relative' to everyone else but it was so hard for him. It was so unnatural for him to sing out of key, he couldn't do it; it sounded really bad. Great guy though and a ridiculously good violinist.

dhruvrajvanshi · on Dec 3, 2024

> A random human won't sing off the cuff with their tonal center magically quantized to one of the twelve keys in our modern western tuning (Equal Temperament).

Why is that relevant? Whatever pitch they pick would fall into one of the 12 buckets, even if it isn't precisely the correct pitch.

jsphweid · on Dec 3, 2024

It's not called "in-the-ballpark" pitch, it's called perfect/absolute pitch. Being up to a quarter tone off is a large error in music. Thinking of pitch in terms of 12 buckets is not musically useful. The vast majority of music is based off consonance where being even a few hertz off means unpleasant dissonance. TLDR: Thinking of pitch as 12 buckets is mostly irrelevant.

fuzzfactor · on Dec 4, 2024

Less than 12 to my ears.

I always observed that the number of random non-musicians who can get it right using the "major" handful of those twelve keys is remarkable enough to be considered.

Since those are the only 12 notes so many people have been hearing from every direction for so long, and truly confined to not more than a few of the major keys that are "dominant" as a result of modern instrumentation, it gets ingrained in the psyche and the notes are almost memorized by frequency. With nothing in-between, so that's what they reproduce without any training. Or can have a more sensitive ear for out-of-tune notes than an actual music student of a number of years.

I agree, it doesn't happen magically.

It just happens about any time naturally.

zeta0134 · on Dec 3, 2024

This is really a fun skill to learn that you have. I've had a pretty good ear for relative pitch since birth, which my music teachers picked up on right away (I could play songs "by ear" after hearing them a couple of times), but I struggled with blind pitch in the mornings... until I realized that, for whatever reason, I can hear the theme from Zora's Domain in perfect clarity in the proper key.

I used that to fake absolute pitch for a while in college, then explained to my voice coach what I was doing, and he looked at me like I had three heads. I'll never forget it. :)

amatecha · on Dec 3, 2024

I find I can recall something with accurate pitch, but the "memory" of that pitch fades over time. Whatever my current favorite song is, I can hum it at the right pitch. But, if I were to try to do so a month later, it will probably be transposed a bit because I somehow lost that sense of the exact correct pitch. My idea of what "feels right" in that regard somehow fades, or something...

w-m · on Nov 28, 2024

If you are a regular HN reader who is (or was until this post) unfamiliar with Back to the Future, I'd love to know three more random facts about your life. In my world view, you are part of a fascinatingly small group of people.

parhamn · on Nov 28, 2024

Part 3 came out in 1990. So, anyone born after (less than 34 years old) who didn't bother to go back and watch it, would be sufficient? I'm familiar with the series' existence, but had no idea what 1.21 reference was. AMA, hah.

wrboyce · on Nov 28, 2024

I’ve been on the BTTF ride at… wherever in Florida it is, and I loved that as a teenager. The films just never really appealed though for some reason. I guess one related fact would be I have a lot of gaps like that in the movies I have seen. For instance, people are often shocked that I’ve never see any of the Indiana Jones movies (also loved the rides!); but Star Wars I could probably recite the scripts of.

I don’t think I have any other facts that are very interesting, but then again I didn’t think not having seen BTTF was all that interesting either. For the record I was familiar with 1.21GW and what it related to… I don’t live under a rock!

irrational · on Nov 28, 2024

I have kids that are in their late 20s. They never watch older movies unless someone forces them to. There is so much new media coming out that they don’t feel the need to watch older movies, even if everyone is telling them it is very good.

xandrius · on Nov 29, 2024

Couldn't you put that media on when they were kids?

I know movie nights are not a thing every family does but I'd imagine having one day modern movie, one day oldish from 80-90a, another day a classic from the 40s, etc.

Wouldn't that have worked if you started from when they were young?

I'm just thinking as that is my plan for when/it I have kids: mix older media with new one and just enjoy it with them. If it is truly good and not just nostalgia, they should be enjoyable even as a rewatch.

irrational · on Dec 1, 2024

I could have, but honestly I watch very little media and the few times I sit down to watch a movie, it has been something new that catches my eye.

swatcoder · on Nov 28, 2024

Since the franchise hasn't been rebooted like so many others, it hasn't seem the $$$ marketing that would introduce it to new generations.

Like The Princess Bride or Labyrinth, BTTF currently remains a phenomenom of the 80's and 90's -- familiar to most from that time and deeply treasured by some, but not refreshed and sustained the way the Star Wars, Star Trek, Marvel/DC, etc brands have been.

nakedneuron · on Nov 28, 2024

w-m · on Nov 26, 2024

Interesting tool, congrats on the launch!

I was wondering: have you thought about automation bias or automation complacency [0]? Sticking with the drop-tables example: if you have an agent that works quite well, the human in the loop will nearly always approve the task. The human will then learn over time that the agent "can be trusted", and will stop reviewing the pings carefully. Hitting the "approve" button will become somewhat automated by the human, and the risky tasks won't be caught by the human anymore.

[0]: https://en.wikipedia.org/wiki/Automation_bias

j45 · on Nov 26, 2024

Premature optimization, and premature automation cause a lot of issues, and overlooking a lot of insight.

By just doing something manually 10-100 times, and collecting feedback, both understanding of the problem, possible solutions/specifications can evolve orders of magnitude better.

dhorthy · on Nov 26, 2024

yeah the people who reach for tools/automation before doing it themself at least 3-10 times drive me crazy.

I think uncle bob or martin fowler said "don't buy a JIRA until you've done it with post-its for 3 months and you know exactly what workflow is best for your team"

j45 · on Nov 26, 2024

I am starting to call that Harry Potter AI prompting.

Coding with English (prompting) is often most useful where existing ways of coding (an excel formula) can’t touch.

Using llms to evaluate things like an excel formulas instead of using excel doesn’t feel in the spirit of using this ai’s power.

dhorthy · on Nov 26, 2024

this is fascinating and resonates with me on a deep level. I'm surprised I haven't stumbled across this yet.

I think we have this problem with all AI systems, e.g. I have let cursor write wrong code from time to time and don't review it at the level I should...we need to solve that for every area of AI. Not a new problem but definitely about to get way more serious

exhaze · on Nov 26, 2024

This is something we frequently saw at Uber. I would say it's the same as there's already an established pattern for this for any sort of destructive action.

Intriguingly, it's rather similar to what we see with LLMs - you want to really activate the person's attention rather than have them go off on autopilot; in this case, probably have them type something quite distinct in order to confirm it, to turn their brain on. Of course, you likely want to figure out some mechanism/heuristics, perhaps by determining the cost of a mistake, and using that to set the proper level of approval scrutiny: light (just click), heavy (have to double confirm via some attention-activating user action).

Finally, a third approach would be to make the action undoable - like in many applications (Uber Eats, Gmail, etc.), you can do something but it defers doing it, giving you a chance to undo it. However, I think that causes people more stress, so it’s rather better to just not do that than to confirm and then have the option to undo. It’s better to be very deliberate about what’s a soft confirm and what’s a hard confirm, optimizing for the human in this case by providing them the right balance of high certainty and low stress.

dhorthy · on Nov 26, 2024

i never thought about undoable actions but I love that workflow in tools like superhuman. I will chat w/ some customers about this idea.

I also like that idea of:

not just a button but like 'I'm $PERSON and I approve this action' or type out 'Signed-off by' style semantics

foota · on Nov 26, 2024

I think the canonical sort of approach here is to make them confirm what they're doing. When you delete a GitHub repo for example, you have to type the name of the repo (even though the UI knows what repo you're trying to delete).

If the table name is SuperImportantTable, you might gloss over that, but if you have to type that out to confirm you're more likely to think about it.

I think the "meat space" equivalent of this is pointing and calling: https://en.m.wikipedia.org/wiki/Pointing_and_calling (famously used by Japanese train operators)

dhorthy · on Nov 26, 2024

this is cool. I have been an andon cord guy forever

rorytbyrne · on Nov 27, 2024

You could continually learn a distribution over AI responses and search for outliers to surface with urgency for approval.

dhorthy · on Nov 27, 2024

i like this idea - runtime inference based on past responses that gets smarter dynamically is a really interesting space