Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there a way to achieve the same with ollama?


Yes, Ollama has Qwen 3 and it works great on a Mac. It may be slightly slower than MLX since Ollama hasn't integrated that (Apple Silicon optimized) library yet, but Ollama models still use the Mac's GPU.

https://ollama.com/library/qwen3


Yes, i did that but its not apple silicon optimized so it was taking forever for 30b models. So its ok, but its not fantastic


You can just use llama.cpp instead (which is what ollama is using under the hood via bindings). Just need to make sure youre using commit `d3bd719` or newer. I normally use this with nvidia/cuda, but tested on my mbp and havent had any speed issues thus far.

Alternatively, LMStudio has MLX support you can use as well.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: