The benchmark results are so incredibly good they are hard to believe. A 30B mod...

coder543 · 2025-04-29T02:44:15 1745894655

Another thing you’re running into is the context window. Ollama sets a low context window by default, like 4096 tokens IIRC. The reasoning process can easily take more than that, at which point it is forgetting most of its reasoning and any prior messages, and it can get stuck in loops. The solution is to raise the context window to something reasonable, such as 32k.

Instead of this very high latency remote debugging process with strangers on the internet, you could just try out properly configured models on the hosted Qwen Chat. Obviously the privacy implications are different, but running models locally is still a fiddly thing even if it is easier than it used to be, and configuration errors are often mistaken for bad model performance. If the models meet your expectations in a properly configured cloud environment, then you can put in the effort to figure out local model hosting.

paradite · 2025-04-29T06:37:37 1745908657

I can't belive Ollama haven't fix the context window limits yet.

I wrote a step-by-step guide on how to setup Ollama with larger context length a while ago: https://prompt.16x.engineer/guide/ollama

TLDR

  ollama run deepseek-r1:14b
  /set parameter num_ctx 8192
  /save deepseek-r1:14b-8k
  ollama serve

anon373839 · 2025-04-29T02:48:32 1745894912

Please check your num_ctx setting. Ollama defaults to a 2048 context length and silently truncates the prompt to fit. Maddening.

rahimnathwani · 2025-04-29T01:02:59 1745888579

You tried a 4-bit quantized version, not the original.

qwen3:30b has the same checksum as https://ollama.com/library/qwen3:30b-a3b-q4_K_M

croemer · 2025-04-29T01:09:03 1745888943

What is the original? The blog post doesn't state the quantization they benchmarked.

rahimnathwani · 2025-04-29T01:10:04 1745889004

This 61GB one: https://ollama.com/library/qwen3:30b-a3b-fp16

You can see it's roughly the same size as the one in the official repo (16 files of 4GB each):

https://huggingface.co/Qwen/Qwen3-30B-A3B/tree/main

int_19h · 2025-04-29T03:22:03 1745896923

fp16 is overkill though. 8-bit is the sweet spot before perf degradation starts getting noticeable.

rahimnathwani · 2025-04-29T03:57:05 1745899025

I haven't yet seen any evals comparing the original Qwen3-30B-A22B with https://ollama.com/library/qwen3:30b-a3b-q8_0