Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s just so clear the math is wrong on these things.

You’ve got an apparent contradiction: SGD (AdamW at 1e-6 give or take) works. So we’ve got extremely abundant local maxima up to epsilon_0, but, it always lands in the same place, so there are abundant “well-studied” minima, likewise symmetrical up to epsilon_1, both of which are roughly: “start debugging if above we can tell”.

The maxima have meaningful curvature tensors at or adjacent to them: AdamW works.

But joker in the deck: control vectors work. So you’re in a quasi-Euclidean region.

In fact all the useful regions are about exactly the same, the weights are actually complex valued, everyone knows this part….

The conserved quantity up to let’s call it phi is compression ratio.

Maybe in a year or two when Altman is in jail and Mme. Su gives George cards that work, well crunch numbers more interesting than how much a googol FMA units cost and Emmy Nother gets some damned credit for knowing this a century ago.



In an attempt to have some basic idea of what you wrote I turned to ChatGPT but ended up having a deeply dystopian conversation. Admittedly triggered by me.

https://chat.openai.com/share/d228e04e-ae36-4468-ac45-fdb035...


I was fascinated enough to read it to the end.


I think that's expected. SGD in such a high-dimensional space is exceedingly likely to find such a minimum.


I’m aware of both theoretical and empirical arguments that I find quite persuasive that you’re exactly right.

I think it’s extremely thought provoking why they would be symmetrically, locally Euclidean in such abundance.


I have no idea what you’re saying. It sounds important and interesting but I’d like some more details.


Roughly, all of the parameters of any NN (and many other models as well” can be thought of as spaces that are flatter or smoother in one region or another or under some pretty fancy “zooming” (Atiyah-Singer indexing give or take).

The way we train them involves finding steepness and chasing it, and it almost always works for a bit, often for quite a while. But the flat places it ends up are both really flat, and zillions of them are nearly identical.

Those two sets of nearly identically “places”, and in particular their difference in being useful via selection bias, are called together or separately a “gauge symmetry”, which basically means things remain true as you vary things a lot. The things that remain true are usually “conserved quantities”, and in the case of OpeenAI 100% compressing the New York Times, the conserved quantity is compression ratio up to some parameter or lossiness.


I am probably way off base here, but what I think you are saying is that these 'flat regions' come close to lossless compression, and thus copyright infringement is occurring.?


Not quite, the abuse of the commons in trivial violation of the spirit of our system of government is suggested (I’d contend demonstrated) by necessary properties of the latent manifolds.

The uniformity (gauge symmetry up to a bound) of such regions is a way of thinking about the apparent contradiction between the properties of a billion dimensional space before and after a scalar loss pushing a gradient around in it.


Okay, yeah obviously there is a loss of entropy.


Entropy is a tricky word: legend has it that von Neumann persuaded Shannon to use it for the logarithmic information measure because “no one knows what it means anyways”.

These days we have KL-divergence and information gain and countless other ways to be rigorous, but you still have to be kind of careful with “macro” vs “micro” states, it’s just a slippery concept.

Whether or not some 7B parameter NN that was like, Xavier-Xe initialized or whatever the Fortress of Solitude people are doing these days is more or less unique than after you push an exabyte of Harry Potter fan fiction through it?

I think that’s an interesting question even if I (we) haven’t yet posed it in a rigorous way.


Rare rule violation on the downvote thing.

Make a dissenting case or leave it the fuck alone. I’ve been pretty laid back about the OpenAI bot thing but I’m over it.


You are cloaking what seems to be an opinion of some kind (unclear what? Something about Sam Altman or maybe about copyright or maybe anti NYT lawsuit?) in obtuse math.

The conclusions you seem to draw are by no means conclusive and at best seem only vaguely related to the unclear moral, ethical or legal stance you seem to believe in.


Ben, honestly some of your comments are extremely abstruse to the point where people can't tell if you are serious or not. That's my take anyway (I don't have downvote power)


I think the contentious tone within the comment combined with the mathematical abstrusity makes it doubly difficult to determine whether the criticism stems from correcting proper high-level-math-other-people-don't-understand or just a personal axe to grind with... someone (openai? sam altman? gradient.ai? meta?).

To those who don't understand the math (or its implications/connections to the shared link here, especially since the link is sparse on details unless you know where to look), I could see how a reader would lean toward the latter and downvote without comment.


A sibling asked about the math, I gave some hopefully useful analogies. I’m happy to elaborate further on what a colossal waste of money the current mega autoencoders are and why this follows from some geometry and topology that’s table stakes in AI now.

“Jail Altman” is basically my sig now. And no one wonders why someone would be passionate about that. Not in good faith.


The mathematics is necessarily an argument sketch.

The single-minded, singular goal of seeing Sam Altman answer to a criminal jury for his crimes?

Serious as a guy happy to talk to journalists. I won’t sleep a full night until he hears a verdict carried by a bailiff.

Any remaining confusion?


@dang there are the same number of points on me finally accusing big Azure IPv4 blocks of manipulating the site as there are downvotes on an argument south of preprint but north of what passes for AI math here.

This is by no means the worst example this month. You run the best moderation team on the Internet, but no one at OpenAI (including Fidji) will flat deny they’re doing it, and it’s just obvious.

I know you’re doing yeoman’s work like always. Have someone let @sama that at least one person is going to start making charts. Not here.


You are being downvoted because you are annoying.


Please elaborate?


I’ve heard feedback in this thread that I’m being mathematically abstruse, and that I’m being controversial or inflammatory or something.

I’m paying attention, but we are talking about a giant neural network trained by my friends and former colleagues at FAIR based mostly out of FBNY where I used to go every day, so, I’ll contend there’s some math involved: this is a topic for people who make a serious priority out of it these days.

The controversial piece no one is coming right out and saying, I think it’s my “fuck @sama” refrain.

Though how something that’s a meme on YouTube channels about typescript is a bigger topic than finally giving Emmy Nother her props (if she’d been a man she’d be far more famous than e.g. Heisenberg) eludes me.

I’m saying that an iconic mathematician and physicist deprived of her rightful place in history had it right, and once crooks like “Fired for Fraud Repeatedly” Altman and Madame Su are out of the picture, we might re-learn what she taught us.

On reflection? Fuck you, you’re annoying, ignorant, and a shill if your comments are anything to go by.


Yeah like I said you're really annoying


You could have learned something.

Instead we all wasted memory remembering that twice.

I plan to forget your username. I hope I never have cause to remember it.

Ronin, masterless. There’s no one to call me to heel if I take a dislike.


Ironic




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: