Very interesting benchmark, excited to see what comes out of this. Considering humans are enourmously more sample efficient compared to today's models, it seems clear there's a lot of room to close that gap. The fact that they hit 5.5x in the first week with relatively straightforward changes suggests we're nowhere near the ceiling for data efficiency