Hacker Newsnew | past | comments | ask | show | jobs | submit | more tripplyons's commentslogin

It's a bad headline. They used publicly available blockchain transactions and didn't cause the collapse of the Terra ecosystem. Terra collapsed because it was a Ponzi scheme offering 20% APY on a fake stablecoin. The Terra stablecoin was not backed by real dollars, but instead by a cryptocurrency called Luna that did nothing else other than let you issue Terra stablecoins.


I mean they are being sued for insider trading, not exactly a bad headline maybe it could say alleged?

It seems extremely unlikely to me, a casual observer of the shit show that was Luna/Terra, that the suit would be successful


ZKML is a very exciting emerging field, but the math is no where near efficient enough to prove an inference result for an LLM yet. They are probably just trying to sell their crypto token.


It is definitely true across different chips. The best kernel to use will vary with what chip it is running on, which often implies that the underlying operations will be executed in a different order. For example, with floating point addition, adding up the same values in a different order can return a different result because floating point addition is not associative due to rounding.


There are many ways to compute the same matrix multiplication that apply the sum reduction in different orders, which can produce different answers when using floating point values. This is because floating point addition is not truly associative because of rounding.


Is that really going to matter in FP32, FP16 or BF16? I would think models would be written so they'd be at least somewhat numerically stable.

Also if the inference provider guarantees specific hardware this shouldn't happen.


Wait, wouldn't it be more significant in low bit numbers, which is the whole reason they're avoided in maths applications? In any work I've ever done, low bit numbers were entirely the reason exact order was important, where float64 or float128makes it mostly negligible.


"Use after free in CSS" is a funny description to see.


I think they meant something like the CSS parser, or the CSS Object Model (CSSOM).


One of the other commenters wrote a post that said it was related to @font-feature-values


Why ?


To me at least it reads funny because when I think of CSS I think of the language itself and not the accompanying tools that are then running the CSS.

Saying "Markdown has a CVE" would sound equally off. I'm aware that its not actually CSS having the vulnerability but when simplified that's what it sounds like.


Funny you'd mention that, when Notepad had a CVE in it's markdown parsing recently.


Wouldn't you just be able to shield the antenna to only point up? I think that is how some aircraft stay protected from GPS jamming/spoofing, and I assume you can do something similar.


The satellite still needs to be able to receive the signal from the terminal. If the Iranian Government sets up transmitters that send up Noise at the satellites, the satellite isn't going to be able to receive the low power transmissions from the terminal if the jammer is close enough to the terminal.

It depends on the Jamming power and the satellite beamforming how close you would have to be to jam it.

This is not the case for GPS because GPS is receive-only and the satellite doesn't listen for user transmissions (although you could still try to jam the control uplink to prevent synchronization which would decrease accuracy over a few days, but then you would have to be close to the GPS control stations and you'll probably get arrested soon)


I did not consider that direction of communication! Seems much easier to locate and send nonsense to satellites than to land-based terminals.


I would guess that someone is beaming a whole lot of wideband power at the satellites themselves, to overload the input receivers.


Thank you! I now realize that my analogy was misplaced due to how Starlink is bidirectional and GPS is unidirectional.


Same here


I don't think you could even define an associative binary operator on states in the Game of Life because of its computational irreducibility.


CGOL is is turing complete. If you can make a NOR gate, you can make anything.


I know it is Turing-complete; I was instead commenting on its computational irreducibility. My point is that it is impossible to express the rules in the form of an associative operator over a sequence of board states. You could say the same thing about iterating with a sufficiently complex circuit of NOR gates.


It's not just the difference in price that would compensate you, it is also the funding rate.

Other than that, futures tend to be more liquid and capital efficient.


SDXL has been outclassed for a while, especially since Flux came out.


Subjective. Most in creative industries regularly still use SDXL.

Once Z-image base comes out and some real tuning can be done, I think it has a chance of replacing it for the function SDXL has


I don't think that's fair. SDXL is crap at composition. It's really good with LoRAs to stylize/inpaint though.


Source?


Most of the people I know doing local AI prefer SDXL to Flux. Lots of people are still using SDXL, even today.

Flux has largely been met with a collective yawn.

The only thing Flux had going for it was photorealism and prompt adherence. But the skin and jaws of the humans it generated looked weird, it was difficult to fine tune, and the licensing was weird. Furthermore, Flux never had good aesthetics. It always felt plain.

Nobody doing anime or cartoons used Flux. SDXL continues to shine here. People doing photoreal kept using Midjourney.


> it was difficult to fine tune

Yep. It's pretty difficult to fine tune, mostly because it's a distilled model. You can fine tune it a little bit, but it will quickly collapse and start producing garbage, even though fundamentally it should have been an easier architecture to fine-tune compared to SDXL (since it uses the much more modern flow matching paradigm).

I think that's probably the reason why we never really got any good anime Flux models (at least not as good as they were for SDXL). You just don't have enough leeway to be able to train the model for long enough to make the model great for a domain it's currently suboptimal for without completely collapsing it.


> It's pretty difficult to fine tune, mostly because it's a distilled model.

What about being distilled makes it harder to fine-tune?


AFAIK a big part of it is that they distilled the guidance into the model.

I'm going to simplify all of this a lot so please bear with me, but normally the equation to denoise an image would look something like this:

    pos = model(latent, positive_prompt_emb)
    neg = model(latent, negative_prompt_emb)
    next_latent = latent + dt * (neg + cfg_scale * (pos - neg))
So what this does - you trigger the model once with a negative prompt (which can be empty) to get the "starting point" for the prediction, and then you run the model again with a positive prompt to get the direction in which you want to go, and then you combine them.

So, for example, let's assume your positive prompt is "dog", and your negative prompt is empty. So triggering the model with your empty prompt with generate a "neutral" latent, and then you nudge it into the direction of your positive prompt, in the direction of a "dog". And you do this for 20 steps, and you get an image of a dog.

Now, for Flux the equation looks like this:

    next_latent = latent + dt * model(latent, positive_prompt_emb)
The guidance here was distilled into the model. It's cheaper to do inference with, but now we can't really train the model too much without destroying this embedded guidance (the model will just forget it and collapse).

There's also an issue of training dynamics. We don't know exactly how they trained their models, so it's impossible for us to jerry rig our training runs in a similar way. And if you don't match the original training dynamics when finetuning it also negatively affects the model.

So you might ask here - what if we just train the model for a really long time - will it be able to recover? And the answer is - yes, but at this point the most of the original model will essentially be overwritten. People actually did this for Flux Schnell, but you need way more resources to pull it off and the results can be disappointing: https://huggingface.co/lodestones/Chroma


Thanks for the extended reply, very illuminating. So the core issue is how they distilled it, ie that they "baked in the offset" so to speak.

I did try Chroma and I was quite disappointed, what I got out looked nowhere near as good as what was advertised. Now I have a better understanding why.


How much would it cost the community to pretrain something with a more modern architecture?

Assuming it was carefully done in stages (more compute) to make sure no mistakes are made?

I suppose we won't need to with the Chinese gifting so much open source recently?


> How much would it cost the community to pretrain something with a more modern architecture?

Quite a lot. Search for "Chroma" (which was a partial-ish retraining of Flux Schnell) or Pony (which was a partial-ish retraining of SDXL). You're probably looking at a cost of at least tens of thousands or even hundred of thousands of dollars. Even bigger SDXL community finetunes like bigASP cost thousands.

And it's not only the compute that's the issue. You also need a ton of data. You need a big dataset, with millions of images, and you need it cleaned, filtered, and labeled.

And of course you need someone who knows what they're doing. Training these state-of-art models takes quite a bit of skill, especially since a lot of it is pretty much a black art.


> Search for "Chroma" (which was a partial-ish retraining of Flux Schnell)

Chroma is not simply a "partial-ish" retraining of Schnell, its a retraining of Schnell after rearchitecting part of the model (replacing a 3.3B parameter portion of the model with a 250M parameter replacement with different architecture.)

> You're probably looking at a cost of at least tens of thousands or even hundred of thousands of dollars.

For reference here, Chroma involved 105,000 hours of H100 GPU time [0]. Doing a quick search, $2/hr seems to be about the low end of pricing for H100 time per hour, so hundreds of thousands seems right for that model, and still probably lower for a base model from scratch.

[0] https://www.reddit.com/r/StableDiffusion/comments/1mxwr4e/up...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: