yeah, that's a very big caveat - haven't checked neo 20b yet. I've had a hard ti...

yeah, that's a very big caveat - haven't checked neo 20b yet. I've had a hard time getting the AI21 models to use it and those are also pretty big so it's interesting why sometimes it works and sometimes it doesn't. (and Davinci > Codegen Davinci > Curie > J-6B). Fine tunes can also learn to do the inner monologue as well which is really cool - not sure how much is architecture vs. training parameters.