> The cognitive burden is much lower when the AI can correctly do 90% of the work.
It's a high cognitive burden if you don't know which 10% of the work the AI failed to do / did incorrectly, though.
I think you're picturing a percentage indicating what scope of the work the AI covered, but the parent was thinking about the accuracy of the work it did cover. But maybe what you're saying is if you pick the right 90% subset, you'll get vastly better than 98% accuracy on that scope of work? Maybe we just need to improve our intuition for where LLMs are reliable and where they're not so reliable.
Though as others have pointed out, these are just made-up numbers we're tossing around. Getting 99% accuracy on 90% of the work is very different from getting 75% accuracy on 50% of the work. The real values vary so much by problem domain and user's level of prompting skill, but it will be really interesting as studies start to emerge that might give us a better idea of the typical values in at least some domains.
It's a high cognitive burden if you don't know which 10% of the work the AI failed to do / did incorrectly, though.
I think you're picturing a percentage indicating what scope of the work the AI covered, but the parent was thinking about the accuracy of the work it did cover. But maybe what you're saying is if you pick the right 90% subset, you'll get vastly better than 98% accuracy on that scope of work? Maybe we just need to improve our intuition for where LLMs are reliable and where they're not so reliable.
Though as others have pointed out, these are just made-up numbers we're tossing around. Getting 99% accuracy on 90% of the work is very different from getting 75% accuracy on 50% of the work. The real values vary so much by problem domain and user's level of prompting skill, but it will be really interesting as studies start to emerge that might give us a better idea of the typical values in at least some domains.