Is there a way to achieve the same with ollama?

simonw · 2025-05-01T13:55:55 1746107755

Yes, Ollama has Qwen 3 and it works great on a Mac. It may be slightly slower than MLX since Ollama hasn't integrated that (Apple Silicon optimized) library yet, but Ollama models still use the Mac's GPU.

https://ollama.com/library/qwen3

avetiszakharyan · 2025-05-01T14:51:46 1746111106

Yes, i did that but its not apple silicon optimized so it was taking forever for 30b models. So its ok, but its not fantastic

spmurrayzzz · 2025-05-01T18:59:17 1746125957

You can just use llama.cpp instead (which is what ollama is using under the hood via bindings). Just need to make sure youre using commit `d3bd719` or newer. I normally use this with nvidia/cuda, but tested on my mbp and havent had any speed issues thus far.

Alternatively, LMStudio has MLX support you can use as well.