Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few thoughts that aren't related to each other.

1. This is a brilliant hack. Kudos.

2. It would be great to see the best codecs included in the comparison - AVIF and JPEG XL. Without those it's rather incomplete. No surprise that JPEG and WEBP totally fall apart at that bitrate.

3. A significant limitation of the approach seems to be that it targets extremely low bitrates where other codecs fall apart, but at these bitrates it incurs problems of its own (artifacts take the form of meaningful changes to the source image instead of blur or blocking, very high computational complexity for the decoder).

When only moderate compression is needed, codecs like JPEG XL already achieve very good results. This proof of concept focuses on the extreme case, but I wonder what would happen if you targeted much higher bitrates, say 5x higher than used here. I suspect (but have no evidence) that JPEG XL would improve in fidelity faster as you gave it more bits than this SD-based technique. Transparent compression, where the eye can't tell a visual difference between source and transcode (at least without zooming in) is the optimal case for JPEG XL. I wonder what sort of bitrate you'd need to provide that kind of guarantee with this technique.



also thought it was odd that AVIF was not compared - it would show a major quality and size improvement over WebP.


The comparison doesn't make much sense because for fair comparisons you have to measure decompressor size plus encoded image size. The decompressor here is super huge because it includes the whole AI model. Also, everyone needs to have the exact same copy of the model in the decompressor for it to work reliably.


Only if decompressor and image are transmitted over the same channel at the same time, and you only have a small number of images. When compressing images for the web I don't care if a webp decompressor is smaller than a jpg or png decompressor, because the recipient already has all of those.

Of course stable diffusion's 4GB is much more extreme than Brotli's 120kb dictionary size, and would bloat a Browser's install size substantially. But for someone like Instagram or a Camera maker it could still make sense. Or imagine phones having the dictionary shipped in the OS to save just a couple kB on bad data connections.


Even if dictionaries were shipped, the biggest difficulty would be performance and resources. Most of these models require beefy compute and a large amount of VRAM that isn't likely to ever exist on end devices.

Unless that can be resolved it just doesn't make sense to use it as a (de)compressor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: