Not sure how he is measuring, I'm still closer to about a 60% success rate. It's more like 20% is an acceptable one-shot, this goes to 60% acceptable with some iteration, but 40% either needs manual intervention to succeed or such significant iteration that manual is likely faster.
I can supervise maybe three agents in parallel before a task requiring significant hand-holding means I'm likely blocking an agent.
And the time an agent is 'restlessly working' on something in usually inversely correlated with the likelihood to succeed. Usually if it's going down a rabbit hole, the correct thing to do is to intervene and reorient it.
I can supervise maybe three agents in parallel before a task requiring significant hand-holding means I'm likely blocking an agent.
And the time an agent is 'restlessly working' on something in usually inversely correlated with the likelihood to succeed. Usually if it's going down a rabbit hole, the correct thing to do is to intervene and reorient it.