Introduce Radial Attention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! Here are some key features:
* Plug-and-play: works with pretrained models like Wan, HunyuanVideo, Mochi
* Speeds up both training&inference by 2–4×, without quality loss
* Compatible with pre-trained LoRAs. When applied to 8-step FusionX LoRA, Radial Attention further delivers a 1.6× speedup
Thanks for pointing this out. I've fixed the prompt. Both FLUX and PixArt use T5 for the text encoder, which has limited capability. Our quantization method can preserve the image quality and contents of the original 16-bit ones well.
Permuting entire columns at once should have zero overhead as long as you permute the rows of the next matrix to match. But as each entry of a column participates in a different scaling group, I guess swapping two columns will reduce quantization error for some while increasing it for others, making it unlikely to get a significant overall improvement in this way.
With the arrival of the RTX 5090, we built a high-performance workstation to maximize its AI computing potential. In this blog post, we share our experience—from overcoming setup challenges to testing its performance.