For my use case, GPT-3.5 quality seems enough, as it already delivers nearly 100% optimal output, and GPT-4 wouldn't be much of an improvement. On the other hand, an upper bound of 8k or even 32k context is just not acceptable for my use case.
I tried out da-vinci, which also works fine, but models below da-vinci don't work well for my use case.
I tried out da-vinci, which also works fine, but models below da-vinci don't work well for my use case.