It's above Opus in second position in the "English" only category. It probably suffers in the overall score due to poor multilingual ability (afaik 95% of its training data was English only).
Though usual caveat about small sample size applies, as of now the CI is fairly big. It's also not at the level of those two in "Code" category, I hope Meta will give the CodeLlama variant an update again.
Though usual caveat about small sample size applies, as of now the CI is fairly big. It's also not at the level of those two in "Code" category, I hope Meta will give the CodeLlama variant an update again.