Aren't there any "blind" benchmarks? | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		amelius on March 12, 2025 \| parent \| context \| favorite \| on: Gemma3 – The current strongest model that fits on ... Aren't there any "blind" benchmarks?

nathanasmith on March 12, 2025 | [–]

Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing.

nickthegreek on March 12, 2025 | [–]

OpenRouter Arena Ratings are probably the closet thing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact