Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We've now partially replicated Reflection Llama 3.1 70B's eval claims (twitter.com/artificialanlys)
4 points by _micah_h on Sept 8, 2024 | hide | past | favorite | 1 comment


And the twit is gone after public outroar.

Now there claim that 70B saw worse performance than Llama 3.1 70B (and obviously worse than closed source alternatives)[1].

Outstanding questions:

- What exactly did they "partially replicate"

- Why Redditors were able to identify all the details (wrapped Claude, wrapped GPT4o, initial prompt, details of finetuned Lllama 3.0, not 3.1) and ArtificialAnlys was not?

- Why after revealing the truth they still write "We are not clear", "We are not clear"?

[1] https://x.com/ArtificialAnlys/status/1832965630472995220




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: