Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
throwdbaaway
77 days ago
|
parent
|
context
|
favorite
| on:
How to run Qwen 3.5 locally
Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context.
35B A3B is faster but didn't do too well in my limited testing.
ranger_danger
76 days ago
[–]
with regular llama.cpp on a 3070ti I get 60tok/s TG with the 9B model, it's quite impressive.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
35B A3B is faster but didn't do too well in my limited testing.