Update on the benchmark numbers: the results in the original post were computed ...

Update on the benchmark numbers: the results in the original post were computed with a looser tokenizer, making the budget less strict than it should be. We've since improved that — the budget is now accurate end-to-end. Corrected numbers at 90% budget: HotpotQA F1 71.57 vs full-context baseline 69.71 — beating it by a wider margin than previously reported. Qasper 46.25 vs 47.22 (~98% of full-context quality). Updated results and scripts: https://github.com/HighSNRInc/highsnr-benchmarks

One thing worth clarifying: there's no model in the processing pipeline. The ranking is fully deterministic — same input always produces the same output. This means it's fast enough for synchronous calls, runs well on commodity CPUs without GPUs, and can handle high throughput without the latency or cost overhead of an inference step.