Claude is like having my own college professor. I've learned more in the past mo...

BalinKing · on Jan 28, 2025

> Claude is like having my own college professor.

I don't use Claude, so maybe there's a huge gap in reliability between it and ChatGPT 4o. But with that disclaimer out of the way, I'm always fairly confused when people report experiences like these—IME, LLMs fall over miserably at even very simple pure math questions. Grammatical breakdowns of sentences (for a major language like Japanese) are also very hit-or-miss. I could see an LLM taking the place of, like, an undergrad TA, but even then only for very well-trod material in its training data.

(Or maybe I've just had better experiences with professors, making my standard for this comparison abnormally high :-P )

EDIT: Also, I figure this sort of thing must be highly dependent on which field you're trying to learn. But that decreases the utility of LLMs a lot for me, because it means I have to have enough existing experience in whatever I'm trying to learn about so that I can first probe whether I'm in safe territory or not.

joseda-hg · on Jan 28, 2025

Major in the context of Japanese is rough, I can see a significant drop in quality when interacting with the same model in say Spanish vs English

For as rich a culture the Japanese have, there's only about 1XX million speakers and the size of the text corpus really matters here, the couple billion of English speakers are also highly motivated to choose English over anything else because Lingua Franca has homefield advantage

To use LLM's efectively you have to work with knowledge of their weaknesses, Math is a good example, you'll get better results from Wolphram Alpha even for the simple things, which is expected

Broad reasoning and explanations tend to be better than overly specific topics, the more common a language, the better the response If a topic has a billion tutorials online, an LLM has a really high chance of figuring out first try

Be smart with the context you provide, the more you actively constrain an LLM, the more likely it is to work with you I have friends that just use it to feed class notes to generate questions and probe it for blindspots until they're satisfied, the improvements on their grade s make it seem like a good approach, but they know that just feeding responses to the LLM isn't trustworthy, so they do and then they also check by themselves, the extra time valuable by itself, if just to improve familiarity with the subject

satvikpendem · on Jan 28, 2025

> LLMs fall over miserably at even very simple pure math questions

They are language models, not calculators or logic languages like Prolog or proof languages like Coq. If you go in with that understanding, it makes a lot more sense as to their capabilities. I would understand the parent poster to mean that they are able to ask and rapidly synthesize information from what the LLM tells them, as a first start rather than necessarily being 100% correct on everything.

BalinKing · on Jan 28, 2025

Of course that's fair advice in itself, but the parent specifically equated them to a "college professor."

justinclift · on Jan 29, 2025

Maybe that should be "college art professor" then? :)

maybesomaybenot · on Jan 28, 2025

I think a lot of these people object to AI probably see the gross amounts of energy it is using, or the trillions of dollars going to fewer than half a dozen men (most american, mostly white).

But, once you've had AI help you solve some gnarly problems, it is hard not to be amazed.

And this is coming from a gal who thinks the idea of self-driving cars is the biggest waste of resources ever.

BalinKing · on Jan 28, 2025

(EDIT: Upon rereading this, it feels unintentionally blunt. I'm not trying to argue, and I apologize if my tone is somewhat unfriendly—that's purely a reflection of the fact that I'm a bad writer!)

Sorry, maybe I should've been clearer in my response—I specifically disagree with the "college professor" comparison. That is to say, in the areas I've tried using them for, LLM's can't even help me solve simple problems, let alone gnarly ones. Which is why hearing about experiences like yours leaves me genuinely confused.

I do get your point about people disagreeing with modern AI for "political" reasons, but I think it's inaccurate to lump everyone into that bucket. I, for one, am not trying to make any broader political statements or anything—I just genuinely can't see how LLMs are as practically useful as other people claim, outside of specific use cases.

maybesomaybenot · on Jan 28, 2025

Like I made very clear, it is great at some things and terrible at others.

YMMV. /shrugs/

KerrAvon · on Jan 28, 2025

How do you know it's accurate?

selcuka · on Jan 28, 2025

They are reasonably accurate, and no tutor is perfect. How do you know your college professor is accurate?

malfist · on Jan 28, 2025

My college professor has certifications and has passed tests that weren't in their training data.

My college professor was also willing to say "I don't know, ask me next class"

paulryanrogers · on Jan 28, 2025

> My college professor was also willing to say "I don't know, ask me next class"

This is a key differentiator that I see in humans over LLMs: knowing ones limits.

selcuka · on Jan 28, 2025

> My college professor has certifications and has passed tests that weren't in their training data.

Granted, they are not (can't be) as rigorous as the tests your professor took, but new models are run through test suites before being released, too.

That being said, I saw my college professors making up things, too (mind you, they were all graduated from very good schools). One example I remember was our argument with a professor who argued that there is a theoretical limit for the coefficient of friction, and it is 1. That can potentially be categorised as a hallucination as it was completely made up and didn't make sense. Maybe it was in his training data (i.e. his own professors).

I agree with the "I don't know" part, though. This is something that LLMs are notoriously bad.

Eisenstein · on Jan 28, 2025

What do you consider 'not in its training data'?

I just asked Claude a question I am pretty sure was not in its training data.

* https://i.imgur.com/XjvImeT.jpeg

mahogany · on Jan 28, 2025

It is immediately wrong in Step 1. A newborn is not a 2:1 ratio of height:width. Certainly not 25cm width (what does that even mean? Shoulder to shoulder?).

This is a perfect example of where not knowing the “domain” leads you astray. As far as I know “newborn width” is not something typically measured, so Claude is pulling something out of thin air.

Indeed you are showing that something not in the training data leads to failure.

philote · on Jan 28, 2025

Babies also aren't rectangles.. you could lay a row shoulder to shoulder, then do another row upside down from the first and their heads would fit between the heads of the first row, saving space.

Edit: it also doesn't account for the fact the moon is more or less a sphere, and not a flat plane.

selcuka · on Jan 28, 2025

That's almost in the training data:

https://www.quora.com/How-many-Humans-can-we-fit-on-the-Moon

Eisenstein · on Jan 28, 2025

I guess coming up with a truly original question is tougher that it seems. Any ideas?

ramses0 · on Jan 28, 2025

Ask them what's the airspeed velocity of a laden astronaut riding a horse on the moon...

Edit: couldn't resist, and dammit!!

Response: Ah, I see what you're doing! Since the Moon has no atmosphere, there’s technically no air to create any kind of airspeed velocity. So, the answer is... zero miles per hour. Unless, of course, you're asking about the speed of the horse itself! In that case, we’d just have to know how fast the astronaut can gallop without any atmosphere to slow them down.

But really, it’s all about the fun of imagining a moon-riding astronaut, isn’t it?

lelanthran · on Jan 28, 2025

An African horse or a European horse?

satvikpendem · on Jan 28, 2025

Did you actually test the math done? Usually LLMs are terrible at math as, as I mentioned in another comment, they are language models, not calculators. Hopefully that changes when LLMs leverage other apps like calculators to get their results, I am not sure if Claude does that already or it's still in development.

Eisenstein · on Jan 28, 2025

Claude has access to an analysis frame which takes javascript which it can use for calculations.

maybesomaybenot · on Jan 28, 2025

You can also test your professor's answers. I don't just walk around going "Oh, Claude was right", I'm literally using what I just learned and am generating correct results. I'm not learning facts like dates, or subject things, I'm learning laws, equations, theories, proofs, etc. (Like how to apply Euler's totient or his extended theories on factorization... there's only one "right answer").

Also, you method for attesting your professors accuracy is inherently flawed. That little piece of paper on their wall doesn't correlate with how accurate they are; it doesn't mean zero, but it isn't foolproof. Hate to break it to you, but even heroes are fallible.

maybesomaybenot · on Jan 28, 2025

Simple: one thing I'm learning about is RFCs for TCP/IP. I can literally go test it. It's like saying, "How do you know it is right when it says 2+2=4"? Some knowledge when taught is self-correcting. Other things I'm studying, like, say, tensor calculus, I can immediately use and know I learned it correctly.

kortilla · on Jan 28, 2025

TCP/IP is a great example though of something you can get seemingly correct and then be subject to all kinds of failure modes in edge cases you didn’t handle correctly (fragmentation, silly windows, options changing header sizes, etc).

maybesomaybenot · on Jan 28, 2025

Thanks for telling me how wrong I am! I bet you're fun at parties.

namaria · on Jan 28, 2025

This is not a party, this is a technology forum and we prize being right

XenophileJKO · on Jan 28, 2025

I would add though, that they can be very good at combining known concepts. Which can create a non-trivial set of "new knowledge".

maybesomaybenot · on Jan 28, 2025

Creating new knowledge from current knowledge is called "synthesis" (ancient term, nothing modern). I'm hoping you're right, it would be amazing.