Most of what people use agents for daily can often be one-shotted though and even collating/rating 10 results would be costly.
If I had a harness for evaluating the results and VC level money, I'd be throwing an army at well defined experimental tasks as well.
Most of what people use agents for daily can often be one-shotted though and even collating/rating 10 results would be costly.
If I had a harness for evaluating the results and VC level money, I'd be throwing an army at well defined experimental tasks as well.