Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I understand that it's a different approach, but I would still have expected this paper to at least mention FlashAttention [1] since they both leverage flash memory.

[1] https://arxiv.org/abs/2205.14135



I'm pretty sure FlashAttention doesn't deal with flash memory at all.

From what I understand, FlashAttention is about using access patterns that better leverage local memory, and especially SRAM. Eg, it about keeping data in CPU L1 cache, or in whatever the GPU equivalent is.

(In other words: FlashAttention is concerned about the part that's faster than DRAM, this is about better offloading to the part that's slower than DRAM)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: