We will as soon as API access is widely available. Once a model goes live, we typically have one-shot reasoning benchmarks up in ~8 hours and comprehensive agentic/combined benchmarks up after 24-48 hours. We're working on building relationships with each lab to have the results before launch.
reply