I think cost should also be a direct consideration. Model performance varies wil...

elemeno · 2025-09-20T16:45:28 1758386728

I’ve been building a tool to help with this - Safety Evals In-a-Box [https://github.com/elemeno/seibox]. It’s a work in progress and not quite ready for public release, but its a multi-model eval runner (primarily for safety oriented evals, but no reason why it can run other types as well!) and includes cost and latency in it reporting.