Hacker Newsnew | past | comments | ask | show | jobs | submit | lmxyy's commentslogin

Introduce Radial Attention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! Here are some key features: * Plug-and-play: works with pretrained models like Wan, HunyuanVideo, Mochi * Speeds up both training&inference by 2–4×, without quality loss * Compatible with pre-trained LoRAs. When applied to 8-step FusionX LoRA, Radial Attention further delivers a 1.6× speedup


Thanks for pointing this out. I've fixed the prompt. Both FLUX and PixArt use T5 for the text encoder, which has limited capability. Our quantization method can preserve the image quality and contents of the original 16-bit ones well.



I think so. There are already some techniques called rotation, which have similar effects. But it will incur additional overheads in diffusion models.


Permuting entire columns at once should have zero overhead as long as you permute the rows of the next matrix to match. But as each entry of a column participates in a different scaling group, I guess swapping two columns will reduce quantization error for some while increasing it for others, making it unlikely to get a significant overall improvement in this way.


FLUX-schnell is only 800ms on RTX 5090.


SVDQuant supports NVFP4 on NVIDIA Blackwell GPUs with 3× speedup over BF16 and better image quality than INT4. Try our interactive demo below or at https://svdquant.mit.edu/! Our code is all available at https://github.com/mit-han-lab/nunchaku!


With the arrival of the RTX 5090, we built a high-performance workstation to maximize its AI computing potential. In this blog post, we share our experience—from overcoming setup challenges to testing its performance.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: