My college professor has certifications and has passed tests that weren't in the...

paulryanrogers · on Jan 28, 2025

> My college professor was also willing to say "I don't know, ask me next class"

This is a key differentiator that I see in humans over LLMs: knowing ones limits.

selcuka · on Jan 28, 2025

> My college professor has certifications and has passed tests that weren't in their training data.

Granted, they are not (can't be) as rigorous as the tests your professor took, but new models are run through test suites before being released, too.

That being said, I saw my college professors making up things, too (mind you, they were all graduated from very good schools). One example I remember was our argument with a professor who argued that there is a theoretical limit for the coefficient of friction, and it is 1. That can potentially be categorised as a hallucination as it was completely made up and didn't make sense. Maybe it was in his training data (i.e. his own professors).

I agree with the "I don't know" part, though. This is something that LLMs are notoriously bad.

Eisenstein · on Jan 28, 2025

What do you consider 'not in its training data'?

I just asked Claude a question I am pretty sure was not in its training data.

* https://i.imgur.com/XjvImeT.jpeg

mahogany · on Jan 28, 2025

It is immediately wrong in Step 1. A newborn is not a 2:1 ratio of height:width. Certainly not 25cm width (what does that even mean? Shoulder to shoulder?).

This is a perfect example of where not knowing the “domain” leads you astray. As far as I know “newborn width” is not something typically measured, so Claude is pulling something out of thin air.

Indeed you are showing that something not in the training data leads to failure.

philote · on Jan 28, 2025

Babies also aren't rectangles.. you could lay a row shoulder to shoulder, then do another row upside down from the first and their heads would fit between the heads of the first row, saving space.

Edit: it also doesn't account for the fact the moon is more or less a sphere, and not a flat plane.

selcuka · on Jan 28, 2025

That's almost in the training data:

https://www.quora.com/How-many-Humans-can-we-fit-on-the-Moon

Eisenstein · on Jan 28, 2025

I guess coming up with a truly original question is tougher that it seems. Any ideas?

ramses0 · on Jan 28, 2025

Ask them what's the airspeed velocity of a laden astronaut riding a horse on the moon...

Edit: couldn't resist, and dammit!!

Response: Ah, I see what you're doing! Since the Moon has no atmosphere, there’s technically no air to create any kind of airspeed velocity. So, the answer is... zero miles per hour. Unless, of course, you're asking about the speed of the horse itself! In that case, we’d just have to know how fast the astronaut can gallop without any atmosphere to slow them down.

But really, it’s all about the fun of imagining a moon-riding astronaut, isn’t it?

lelanthran · on Jan 28, 2025

An African horse or a European horse?

satvikpendem · on Jan 28, 2025

Did you actually test the math done? Usually LLMs are terrible at math as, as I mentioned in another comment, they are language models, not calculators. Hopefully that changes when LLMs leverage other apps like calculators to get their results, I am not sure if Claude does that already or it's still in development.

Eisenstein · on Jan 28, 2025

Claude has access to an analysis frame which takes javascript which it can use for calculations.

maybesomaybenot · on Jan 28, 2025

You can also test your professor's answers. I don't just walk around going "Oh, Claude was right", I'm literally using what I just learned and am generating correct results. I'm not learning facts like dates, or subject things, I'm learning laws, equations, theories, proofs, etc. (Like how to apply Euler's totient or his extended theories on factorization... there's only one "right answer").

Also, you method for attesting your professors accuracy is inherently flawed. That little piece of paper on their wall doesn't correlate with how accurate they are; it doesn't mean zero, but it isn't foolproof. Hate to break it to you, but even heroes are fallible.