Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> My first hand experience tells ChatGPT and even ClaudeCode are no where near the expertise they are touted to be

Not doubting you, but where specifically have the latest models fallen short for you?





ClaudeCode:

- Making functions async without need; it doesn't know the difference between the two or in which scenarios to use them.

- Consistently fails to make changes to the frontend if a project grows above 5000 LOC or a file goes near 1000 LOC.

- The worst part is it lies after making changes.

ChatGPT:

- Fails to implement mid-complex functionality such as scrolling to the bottom when new logs are coming in and not scrolling when the user is checking historical logs.

These models are good at mainstream tasks, the snippets of which you find a lot in repositories. Try to do something off-beat such as algorithmic trading; they fail spectacularly.


I'm unsure how someone could use LLMs regularly and not encounter significant mistakes. I use them a lot less than some devs and still run into basic errors pretty often, to the point that I rarely bother using them for niche or complicated problems even though they are pretty helpful in other cases. Just in the past few days I've had Claude trip all over itself on multiple basic tasks.

One case was asking how to do a straightforward thing with a popular open source JavaScript library, right in the sweet spot of what models should excel at. Claude's whole approach was completely broken because it relied on a hallucinated library parameter that didn't exist and didn't have an equivalent. It invented a keyword that doesn't appear in the entire open source library repo, to control functionality the library doesn't have.


Care to share a chat link?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: