Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
amelius
on March 12, 2025
|
parent
|
context
|
favorite
| on:
Gemma3 – The current strongest model that fits on ...
Aren't there any "blind" benchmarks?
nathanasmith
on March 12, 2025
|
next
[–]
Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing.
nickthegreek
on March 12, 2025
|
prev
[–]
OpenRouter Arena Ratings are probably the closet thing.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: