I understand that it's a different approach, but I would still have expected thi...

PoignardAzur · on Dec 20, 2023

I'm pretty sure FlashAttention doesn't deal with flash memory at all.

From what I understand, FlashAttention is about using access patterns that better leverage local memory, and especially SRAM. Eg, it about keeping data in CPU L1 cache, or in whatever the GPU equivalent is.

(In other words: FlashAttention is concerned about the part that's faster than DRAM, this is about better offloading to the part that's slower than DRAM)