Would be nice, but no one can really do that at the moment.
There are specific tasks (especially character level ones) which are hard due to the tokenizer, but even that isn't all that convincing since there are plenty of character level tasks which GPT-4 can do pretty well.
If you use it a lot you build an intuition for what kinds of tasks it will do well on, but it's not exactly rigorous.
> Would be nice, but no one can really do that at the moment.
Why not? Someone could definitely build up database on why GPT is bad at some things and good at others. There is already good explanations for why it's terrible at math, why it doesn't handle single characters/numbers well and so on.
There are specific tasks (especially character level ones) which are hard due to the tokenizer, but even that isn't all that convincing since there are plenty of character level tasks which GPT-4 can do pretty well.
If you use it a lot you build an intuition for what kinds of tasks it will do well on, but it's not exactly rigorous.