Finetuned Palm for Medicine and Finetuned Minerva for Math all perform a good deal worse than GPT-4.
A fine-tuned smaller model is by no means guaranteed to beat a larger more general one (though of course you may get acceptable performance).
And then the necessity of fine-tuning itself is called into question plenty with LLMs.
https://huggingface.co/papers/2308.00304
https://huggingface.co/papers/2308.07921
https://arxiv.org/abs/2211.09066
Finetuned Palm for Medicine and Finetuned Minerva for Math all perform a good deal worse than GPT-4.
A fine-tuned smaller model is by no means guaranteed to beat a larger more general one (though of course you may get acceptable performance).
And then the necessity of fine-tuning itself is called into question plenty with LLMs.
https://huggingface.co/papers/2308.00304
https://huggingface.co/papers/2308.07921
https://arxiv.org/abs/2211.09066