lmxyy's comments

lmxyy · 2025-07-07T20:31:10 1751920270

Introduce Radial Attention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! Here are some key features: * Plug-and-play: works with pretrained models like Wan, HunyuanVideo, Mochi * Speeds up both training&inference by 2–4×, without quality loss * Compatible with pre-trained LoRAs. When applied to 8-step FusionX LoRA, Radial Attention further delivers a 1.6× speedup

lmxyy · 2025-02-23T05:26:01 1740288361

Thanks for pointing this out. I've fixed the prompt. Both FLUX and PixArt use T5 for the text encoder, which has limited capability. Our quantization method can preserve the image quality and contents of the original 16-bit ones well.

lmxyy · 2025-02-23T05:13:02 1740287582

It has already released as in https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#c....

lmxyy · 2025-02-22T06:40:59 1740206459

I think so. There are already some techniques called rotation, which have similar effects. But it will incur additional overheads in diffusion models.

yorwba · 2025-02-22T06:59:19 1740207559

Permuting entire columns at once should have zero overhead as long as you permute the rows of the next matrix to match. But as each entry of a column participates in a different scaling group, I guess swapping two columns will reduce quantization error for some while increasing it for others, making it unlikely to get a significant overall improvement in this way.

lmxyy · 2025-02-22T02:20:53 1740190853

FLUX-schnell is only 800ms on RTX 5090.

lmxyy · 2025-02-22T00:46:18 1740185178

SVDQuant supports NVFP4 on NVIDIA Blackwell GPUs with 3× speedup over BF16 and better image quality than INT4. Try our interactive demo below or at https://svdquant.mit.edu/! Our code is all available at https://github.com/mit-han-lab/nunchaku!

lmxyy · 2025-02-11T17:41:59 1739295719

With the arrival of the RTX 5090, we built a high-performance workstation to maximize its AI computing potential. In this blog post, we share our experience—from overcoming setup challenges to testing its performance.