Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't find llama3.1 noticeably worse on 8 bit integer quantised than the original fp16 to be honest. It's also a lot faster.

Of course even then you're not going to reach the whole 128k context window on 16GB but if you don't need that it works great.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: