Claude is like having my own college professor. I've learned more in the past month with Claude then I learned in the past year. I can ask questions repeatedly and get clarification as fine as a need it. Granted, Claude has limits, but its a game-changer.
> I think the key to being successful here is to realize that you're still at the wheel as an engineer. The llm is there to rapidly synthesize the universe of information.
Bingo. OP is like someone who is complaining about the tools, when they should be working on their talent. I have a LOT of hobbies (circuits, woodworking, surfing, playing live music, cycling, photography) and there will always be people who buy the best gear and complain that the gear sucks. (NOTE: I"m not implying claude is "the best gear", but it's a big big help.)
I think the only problem with LLMs is synthesis of new knowledge is severely limited. They are great at explaining things others have explained, but suck hard at inventing new things. At least that's my experience with Claude: it's terrible as a "greenfield" dev.
I don't use Claude, so maybe there's a huge gap in reliability between it and ChatGPT 4o. But with that disclaimer out of the way, I'm always fairly confused when people report experiences like these—IME, LLMs fall over miserably at even very simple pure math questions. Grammatical breakdowns of sentences (for a major language like Japanese) are also very hit-or-miss. I could see an LLM taking the place of, like, an undergrad TA, but even then only for very well-trod material in its training data.
(Or maybe I've just had better experiences with professors, making my standard for this comparison abnormally high :-P )
EDIT: Also, I figure this sort of thing must be highly dependent on which field you're trying to learn. But that decreases the utility of LLMs a lot for me, because it means I have to have enough existing experience in whatever I'm trying to learn about so that I can first probe whether I'm in safe territory or not.
Major in the context of Japanese is rough, I can see a significant drop in quality when interacting with the same model in say Spanish vs English
For as rich a culture the Japanese have, there's only about 1XX million speakers and the size of the text corpus really matters here, the couple billion of English speakers are also highly motivated to choose English over anything else because Lingua Franca has homefield advantage
To use LLM's efectively you have to work with knowledge of their weaknesses, Math is a good example, you'll get better results from Wolphram Alpha even for the simple things, which is expected
Broad reasoning and explanations tend to be better than overly specific topics, the more common a language, the better the response
If a topic has a billion tutorials online, an LLM has a really high chance of figuring out first try
Be smart with the context you provide, the more you actively constrain an LLM, the more likely it is to work with you
I have friends that just use it to feed class notes to generate questions and probe it for blindspots until they're satisfied, the improvements on their grade s make it seem like a good approach, but they know that just feeding responses to the LLM isn't trustworthy, so they do and then they also check by themselves, the extra time valuable by itself, if just to improve familiarity with the subject
> LLMs fall over miserably at even very simple pure math questions
They are language models, not calculators or logic languages like Prolog or proof languages like Coq. If you go in with that understanding, it makes a lot more sense as to their capabilities. I would understand the parent poster to mean that they are able to ask and rapidly synthesize information from what the LLM tells them, as a first start rather than necessarily being 100% correct on everything.
I think a lot of these people object to AI probably see the gross amounts of energy it is using, or the trillions of dollars going to fewer than half a dozen men (most american, mostly white).
But, once you've had AI help you solve some gnarly problems, it is hard not to be amazed.
And this is coming from a gal who thinks the idea of self-driving cars is the biggest waste of resources ever.
(EDIT: Upon rereading this, it feels unintentionally blunt. I'm not trying to argue, and I apologize if my tone is somewhat unfriendly—that's purely a reflection of the fact that I'm a bad writer!)
Sorry, maybe I should've been clearer in my response—I specifically disagree with the "college professor" comparison. That is to say, in the areas I've tried using them for, LLM's can't even help me solve simple problems, let alone gnarly ones. Which is why hearing about experiences like yours leaves me genuinely confused.
I do get your point about people disagreeing with modern AI for "political" reasons, but I think it's inaccurate to lump everyone into that bucket. I, for one, am not trying to make any broader political statements or anything—I just genuinely can't see how LLMs are as practically useful as other people claim, outside of specific use cases.
> My college professor has certifications and has passed tests that weren't in their training data.
Granted, they are not (can't be) as rigorous as the tests your professor took, but new models are run through test suites before being released, too.
That being said, I saw my college professors making up things, too (mind you, they were all graduated from very good schools). One example I remember was our argument with a professor who argued that there is a theoretical limit for the coefficient of friction, and it is 1. That can potentially be categorised as a hallucination as it was completely made up and didn't make sense. Maybe it was in his training data (i.e. his own professors).
I agree with the "I don't know" part, though. This is something that LLMs are notoriously bad.
It is immediately wrong in Step 1. A newborn is not a 2:1 ratio of height:width. Certainly not 25cm width (what does that even mean? Shoulder to shoulder?).
This is a perfect example of where not knowing the “domain” leads you astray. As far as I know “newborn width” is not something typically measured, so Claude is pulling something out of thin air.
Indeed you are showing that something not in the training data leads to failure.
Babies also aren't rectangles.. you could lay a row shoulder to shoulder, then do another row upside down from the first and their heads would fit between the heads of the first row, saving space.
Edit: it also doesn't account for the fact the moon is more or less a sphere, and not a flat plane.
Ask them what's the airspeed velocity of a laden astronaut riding a horse on the moon...
Edit: couldn't resist, and dammit!!
Response: Ah, I see what you're doing! Since the Moon has no atmosphere, there’s technically no air to create any kind of airspeed velocity. So, the answer is... zero miles per hour. Unless, of course, you're asking about the speed of the horse itself! In that case, we’d just have to know how fast the astronaut can gallop without any atmosphere to slow them down.
But really, it’s all about the fun of imagining a moon-riding astronaut, isn’t it?
Did you actually test the math done? Usually LLMs are terrible at math as, as I mentioned in another comment, they are language models, not calculators. Hopefully that changes when LLMs leverage other apps like calculators to get their results, I am not sure if Claude does that already or it's still in development.
You can also test your professor's answers. I don't just walk around going "Oh, Claude was right", I'm literally using what I just learned and am generating correct results. I'm not learning facts like dates, or subject things, I'm learning laws, equations, theories, proofs, etc. (Like how to apply Euler's totient or his extended theories on factorization... there's only one "right answer").
Also, you method for attesting your professors accuracy is inherently flawed. That little piece of paper on their wall doesn't correlate with how accurate they are; it doesn't mean zero, but it isn't foolproof. Hate to break it to you, but even heroes are fallible.
Simple: one thing I'm learning about is RFCs for TCP/IP. I can literally go test it. It's like saying, "How do you know it is right when it says 2+2=4"? Some knowledge when taught is self-correcting. Other things I'm studying, like, say, tensor calculus, I can immediately use and know I learned it correctly.
TCP/IP is a great example though of something you can get seemingly correct and then be subject to all kinds of failure modes in edge cases you didn’t handle correctly (fragmentation, silly windows, options changing header sizes, etc).
> I think the key to being successful here is to realize that you're still at the wheel as an engineer. The llm is there to rapidly synthesize the universe of information.
Bingo. OP is like someone who is complaining about the tools, when they should be working on their talent. I have a LOT of hobbies (circuits, woodworking, surfing, playing live music, cycling, photography) and there will always be people who buy the best gear and complain that the gear sucks. (NOTE: I"m not implying claude is "the best gear", but it's a big big help.)
I think the only problem with LLMs is synthesis of new knowledge is severely limited. They are great at explaining things others have explained, but suck hard at inventing new things. At least that's my experience with Claude: it's terrible as a "greenfield" dev.