Hacker Newsnew | past | comments | ask | show | jobs | submit | hxtk's commentslogin

When I think "premature optimization," I think of things like making a tradeoff in favor of performance without justification. It could be a sacrifice of readability by writing uglier but more optimized code that's difficult to understand, or spending time researching the optimal write pattern for a database that I could spend developing other things.

I don't think I should ignore what I already know and intentionally pessimize the first draft in the name of avoiding premature optimization.


But you have to get money into your crypto wallet somehow, which makes it relatively easy to deanonymize for most users (serious crypto privacy enthusiasts could of course pay cash for their crypto or perhaps mine it themselves) if they're looking at your traffic specifically, but hard if you're only worried about bulk collection.

IMO the coolest privacy option they have is to literally mail them an envelope full of cash with just your account's cash payment ID.


You can exchange common coins into Monero and Monero is fully private.

The change log for developers/administrators/etc. comes from the Git history. We use conventional commits. Breaking changes in the semantic version of a subcomponent means, "If you're operating this service, you can't just bump the container version and keep your current config.

The change log for end users comes from the JIRA board for the release, looking at what tickets got closed that release cycle, and it usually requires some amount of human effort to rewrite.


I always think about the way you discover the problem. I used to say the same about RNG: if you need fast PRNG and you pick CSPRNG, you’ll find out when you profile your application because it isn’t fast enough. In the reverse case, you’ll find out when someone successfully guesses your private key.

If you need performance and you pick data integrity, you find out when your latency gets too high. In the reverse case, you find out when a customer asks where all their data went.


CockroachDB is serializable by default, but I don’t know about their other settings.

iPhone had the 12 and 13 mini, but they didn't sell, so there was no iPhone 14 mini and hasn't been one since. That was a 5.4" display.

> but they didn't sell

What do you call "didn't sell" ? In numbers.



In march of that year, but was higher before, no ? Anyway, that's ~ 3 million phones. It's a lot.

I'm making this distinction post-hoc, so I already know how it turned out: they ultimately decided to stop making those smaller devices. I assume that means it wasn't enough sales to be a financially viable product, and to me, selling "enough" would mean that Apple found it profitable to maintain the supply chains and assembly lines for those smaller devices and continued to invest in the product.

Arguing against myself, Apple could be discontinuing the smaller models because they did market research and found that most buyers of smaller, cheaper devices could be converted to buyers of larger, more expensive devices if those smaller devices didn't exist. Auto manufacturers are doing just that, discontinuing or enlarging smaller light trucks in favor of larger models that are subject to less regulation and therefore can be designed and manufactured more cheaply and might offer even more profit.

If Apple has or had that strategy, then my assumptions are flawed because no matter how many mini iPhones they sell, they would still want to get rid of the line as long as most of those customers could be converted to full-size iPhone customers.


It’s a hell of a lot of phones.

I skipped that one because my SE 2 was less than a year old and I didn't want to go up a size.

I cherish my iPhone 12 Mini and treat it with great care as it is the form factor I want and I want it to last as long as possible.

Same here. Battery is a bit bad though. But I carry a battery pack. I do notice the charging port is also getting worse recently. I don't know what I'll buy after this...

I suspect trading firms have already done this to the maximum extent that it's profitable to do so. I think if you were to integrate LLMs into a trading algorithm, you would need to incorporate more than just signals from the market itself. For example, I hazard a guess you could outperform a model that operates purely on market data with a model that also includes a vector embedding of a selection of key social and news media accounts or other information sources that have historically been difficult to encode until LLMs.

The part people are missing here is that if the trading firms are all doing something, that in itself influences the market.

If they are all giving the LLMs money to invest and the AIs generally buy the same group of stocks, those stocks will go up. As more people attempt the strategy it infuses fresh capital and more importantly signaling to the trading firms there are inflows to these stocks. I think its probably a reflexive loop at this point.


They could have the AI perform paper trading: give it a simulated account but real data. This would make sense to me if it was just a research project. That said, I imagine the more high-tech trading firms started running this research a long time ago and wouldn't be surprised if there were already LLM-based trading bots that could be influencing the market.

"includes a vector embedding of a selection of key social and news media accounts or other information sources that have historically been difficult to encode until LLMs."

Not really. Sentiment analysis in social networks has been around for years. It's probably cheaper to by that analysis and feed it to LLMs than to have LLMs do it.


The problem with function coloring is that it makes libraries difficult to implement in a way that's compatible with both sync and async code.

In Python, I needed to write both sync and async API clients for some HTTP thing where the logical operations were composed of several sequential HTTP requests, and doing so meant that I needed to implement the core business logic as a Generator that yields requests and accepts responses before ultimately returning the final result, and then wrote sync and async drivers that each ran the generator in a loop, pulling requests off, transacting them with their HTTP implementation, and feeding the responses back to the generator.

This sans-IO approach, where the library separates business logic from IO and then either provides or asks the caller to implement their own simple event loop for performing IO in their chosen method and feeding it to the business logic state machine, has started to appear as a solution to function coloring in Rust, but it's somewhat of an obtuse way to support multiple IO concurrency strategies.

On the other hand, I do find it an extremely useful pattern for testability, because it results in very fuzz-friendly business logic implementation, isolated side-effect code, and a very simple core IO loop without much room in it for bugs, so despite being somewhat of a pain to write I still find it desirable at times even when I only need to support one of the two function colors.


My opinion is that if your library or function is doing IO, it should be async - there is no reason to support "sync I/O".

Also, this "sans IO" trend is interesting, but the code boils down to a less ergonomic, more verbose, and less efficient version of async (in Rust). It's async/await with more steps, and I would argue those steps are not great.


> there is no reason to support "sync I/O"

I disagree strongly.

From a performance perspective, asynchronous IO makes a lot of sense when you're dealing concurrently with a large number of tasks which each spend most of their time waiting for IO operations to complete. In this case, running those tasks in a single-threaded event loop is far more efficient than launching off thousands of individual threads.

However, if your application falls into literally any other category, then suddenly you are actually paying a performance penalty, since you need the overhead of running an event loop any time you just want to perform some IO.

Also, from a correctness perspective, non-concurrent code is simply a lot less complex and a lot harder to get wrong than concurrent code. So applications which don't need async also end up paying a maintainability, and in some cases memory safety / thread safety, penalty as well.


The beautiful thing about the “async” abstraction is that it doesn’t actually tie you to an event loop at all. Nothing about it implies that somebody is calling `epoll_wait` or similar anywhere in the stack.

It’s just a compiler feature that turns functions into state machines. It’s totally valid to have an async runtime that moves a task to a thread and blocks whenever it does I/O.

I do agree that async without memory safety and thread safety is a nightmare (just like all state machines are under those circumstances). Thankfully, we have languages now that all but completely solve those issues.


You surely must be referring to Rust, the only multithreaded language with async-await in which data races aren't possible.

Rust is lovely and all, but is a bad example for the performance side of the argument, since in practice libraries usually have to decide on an async runtime, so in practice library users have to launch that runtime (usually Tokio) to execute the library's Futures.


Sure, but that’s a library limitation (no widespread common runtime interface that libraries such as Tokio implement), not a fundamental limitation of async.

Thread safety is also a lot easier to achieve in languages like C#, and then of course you have single-threaded environments like JS and Python.


I believe it. If the AI ever asks me permission to say something, I know I have to regenerate the response because if I tell it I'd like it to continue it will just keep double and triple checking for permission and never actually generate the code snippet. Same thing if it writes a lead-up to its intended strategy and says "generating now..." and ends the message.

Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."

The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.


While I never measured it, this aligns with my own experiences.

It's better to have very shallow conversations where you keep regenerating outputs aggressively, only picking the best results. Asking for fixes, restructuring or elaborations on generated content has fast diminishing returns. And once it made a mistake (or hallucinated) it will not stop erring even if you provide evidence that it is wrong, LLMs just commit to certain things very strongly.


I largely agree with this advice but in practice using Claude Code / Codex 4+ hours a day, it's not always that simple. I have a .NET/React/Vite webapp that despite the typical stack has a lot of very specific business logic for a real world niche. (Plus some poor early architectural decisions that are being gradually refactored with well documented rules).

I frequently see (both) agents make wrong assumptions that inevitably take multiple turns of needing it to fail to recognize the correct solution.

There can be like a magnetic pull where no matter how you craft the initial instructions, they will both independently have a (wrong) epiphany and ignore half of the requirements during implementation. It takes messing up once or twice for them to accept that their deep intuition from training data is wrong and pivot. In those cases I find it takes less time to let that process play out vs recrafting the perfect one shot prompt over and over. Of course once we've moved to a different problem I would definitely dump that context ASAP.

(However, what is cool working with LLMs, to counterbalance the petty frustrations that sometimes make it feel like a slog, is that they have extremely high familiarity with the jargon/conventions of that niche. I was expecting to have to explain a lot of the weird, too clever by half abbreviations in the legacy VBA code from 2004 it has to integrate with, but it pretty much picks up on every little detail without explanation. It's always a fun reminder that they were created to be super translaters, even within the same language but from jargon -> business logic -> code that kinda works).


A human would cross out that part of the worksheet, but an LLM keeps re-reading the wrong text.

I never had a conversation like that — probably because I personally rarely use LLMs to actually generate code for me — but I've somehow subconciously learned to do this myself, especially with clarifying questions.

If I find myself needing to ask a clarifying question, I always edit the previous message to ask the next question because the models seem to always force what they said in their clarification into further responses.

It's... odd... to find myself conditioned, by the LLM, to the proper manners of conditioning the LLM.


Blameless postmortem culture recognizes human error as an inevitability and asks those with influence to design systems that maintain safety in the face of human error. In the software engineering world, this typically means automation, because while automation can and usually does have faults, it doesn't suffer from human error.

Now we've invented automation that commits human-like error at scale.

I wouldn't call myself anti-AI, but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation. As a low-stakes personal example, instead of using AI to generate boilerplate code, I'll often try to use AI to generate a traditional code generator to convert whatever DSL specification into the chosen development language source code, rather than asking AI to generate the development language source code directly from the DSL.


Yeah I see things like "AI Firewalls" as both, firstly ridiculously named, but also, the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy.

For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious.

Big one I saw was a tool that ingested a humans report on a safety incident, adjusted them with an LLM, and then posted the result to an OHS incident log. 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.


The air Canada chatbot that mistakenly told someone they can cancel and be refunded for a flight due to a bereavement is a good example of this. It went to court and they had to honour the chatbot’s response.

It’s quite funny that a chatbot has more humanity than its corporate human masters.


Not AI, but similar sounding incident in Norway. Some traders found a way to exploit another company's trading bot at the Oslo Stock Exchange. The case went to court. And the court's ruling? "Make a better trading bot."

I am so glad to read this. Last I had read on the case was that the traders were (outrageously) convicted of market manipulation: https://www.cnbc.com/2010/10/14/norwegians-convicted-for-out...

But you are right, they appealed and had their appeal upheld by the Supreme Courts: https://www.finextra.com/newsarticle/23677/norwegian-court-a...

I am so glad at the result.


Chatbots have no fear of being fired, most humans would do the same in a similar position.

More to the point, most humans loudly declare they would do the right thing, so all the chatbot’s training data is on people doing the right thing. There’s comparatively fewer loud public pronunciations of personal cowardice, so if the bot’s going to write a realistic completion, it’s more likely to conjure an author acting heroically.

Do they not? If a chatbot isn't doing what its owners want, won't they just shut it down? Or switch to a competitor's chatbot?

"... adding fear into system prompt"

What a nice side effect, unfortunately they’ll lock chatbots with more barriers in the future but that’s ironic.

...And under pressure, those barriers will fail, too.

It is not possible, at least with any of the current generations of LLMs, to construct a chatbot that will always follow your corporate policies.


That's what people aren't understanding, it seems.

You are providing people with an endlessly patient, endlessly novel, endlessly naive employee to attempt your social engineering attacks on. Over and over and over. Hell, it will even provide you with reasons for its inability to answer your question, allowing you to fine-tune your attacks faster and easier than with a person.

Until true AI exists, there are no actual hard-stops, just guardrails that you can step over if you try hard enough.

We recently cancelled a contract with a company because they implemented student facing AI features that could call data from our student information and learning management systems. I was able to get it to give me answers to a test for a class I wasn't enrolled in and PII for other students, even though the company assured us that, due to their built-in guardrails, it could only provide general information for courses that the students are actively enrolled in (due dates, time limits, those sorts of things). Had we allowed that to go live (as many institutions have), it was just a matter of time before a savvy student figured that out.

We killed the connection with that company the week before finals, because the shit-show of fixing broken features was less of a headache than unleashing hell on our campus in the form of a very friendly chatbot.


With chat ai + guardrail AI it probably will get to the point of it being sure enough that the amount of mistakes won't hit the bottom line.

...and we will find a way to turn it into malicious compliance where rules are not broken but stuff corporation wanted to happen doesn't.


Efficiency, not money, seems to be the currency of chatbots

That policy would be fraudulently exploited immediately. So is it more humane or more gullible?

I suppose it would hallucinate a different policy if it includes in the context window the interests of shareholders, employees and other stakeholders, as well as the customer. But it would likely be a more accurate hallucination.


> 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.

I agree, and also I am now remembering Terry Pratchett's (much lower stakes) reason for getting angry with his German publisher: https://gmkeros.wordpress.com/2011/09/02/terry-pratchett-and...

Which is also the kind of product placement that comes up at least once in every thread about how LLMs might do advertising.


> … LLMs might do advertising.

It’s no longer “might”. There was very recently a leak that OpenAI is actively working on this.


It's "how LLMs might do" it right up until we see what they actually do.

There's lots of other ways they might do it besides this way.


Even if they don't offer it. People will learn how to poison AI corupus just like they did with search results.

We ain't safe from aggressive ai ads either way


You seem to be indulging in wishful thinking.

"I see you're annoyed with that problem, did you ate recently ? There is that restaurant that gets great reviews near you, and they have a promotion!"

> the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy

It usually works though. There are no guarantees of course, but sanity checking an LLMs output with another instance of itself usually does work because LLMs usually aren't reliably wrong in the same way. For instance if you ask it something it doesn't know and it hallucinates a plausible answer, another instance of the same LLM is unlikely to hallucinate the same exact answer, it'll probably give you another answer, which is your heads up that probably both are wrong.


Yeah but, real firewalls are deterministic. Hoping that a second non deterministic thing, will make something more deterministic is weird.

Probably usually it will work, like probably usually the LLM can be unsupervised. but that 1% error rate in production is going to add up fast.


Sure, and then you can throw another LLM in and make them come to a consensus, of course that could be wrong too so have another three do the same and then compare, and then…

Or maybe it will be a circle of LLMs all coming up with different responses and all telling each other "You're absolutely right!"

I have an ongoing and endless debate with a PhD that insists consensus of multiple LLMs is a valid proof check. The guy is a neuroscientist, not at all a developer tech head, and is just stubborn, continually projecting a sentient being perspective on his LLM usage.

This, but unironically. It's not much different from the way human unreliability is accounted for. Add more until you're satisfied a suitable ratio of mistakes will be caught.

It's "wonderfully" human way.

Just like sometimes you need senior/person at power to tell the junior "no, you can't just promise the project manager shorter deadline with no change in scope, and if PM have problem with that they can talk with me", now we need Judge Dredd AI to keep the law when other AIs are bullied into misbehaving


> For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious

Especially since every mainstream model has been human preference-tuned to obey the requests of the user…

I think you may be able to have an LLM customer facing, but it would have to be a purpose-trained one from a base model, not a repurposed sycophantic chat assistant.


Exactly what I've been worrying about for a few months now [0]. Arguments like "well at least this is as good as what humans do, and much faster" are fundamentally missing the point. Humans output things slowly enough that other humans can act as a check.

[0] https://news.ycombinator.com/item?id=44743651


I've heard people working in construction industry mentioning that quality of design fell off the cliff when industry began to use computers more widely – less time and less people involved. The same is true about printing – there was much more time and people in the loop before computers. My grandmother worked with linotype machine printing newspapers. They were really good at catching and fixing grammar errors, sometimes catching even factual errors etc.

looks at the current state of the US government

Do they? Because near as I can tell, speed running around the legal system - when one doesn’t have to worry about consequences - works just fine.


That's a good point. I'm talking specifically in the context of deploying code. The potential for senior devs to be totally overwhelmed with the work of reviewing junior devs' code is limited by the speed at which junior devs create PRs.

So today? With ML tools?

Could you explain what you mean, please?

Junior devs can currently create CLs/PRs faster than the senior can review them.

Indeed. In the language of the post I linked [0]: it's currently an occasional problem, and it risks becoming a widespread rot.

[0] https://news.ycombinator.com/item?id=44743651


> Now we've invented automation that commits human-like error at scale.

Then we can apply the same (or similar) guardrails that we'd like to use for humans, to also control the AI behavior.

First, don't give them unsafe tools. Sandbox them within a particular directory (honestly this should be how things work for most of your projects, especially since we pull code from the Internet), even if a lot of tools give you nothing in this regard. Use version control for changes, with the ability to roll back. Also have ample tests and code checks with actionable information on failures. Maybe even adversarial AIs that critique one another if problematic things are done, like one sub-task for implementation and another for code-review.

Using AI tools has pushed me into that direction with some linter rules and prebuild scripts, to enforce more consistent code - since previously you'd have to tell coworkers not to do something (because ofc nobody would write/read some obtuse style guide) but AI can generate code 10x faster than people do, so having immediate feedback along the lines of "Vue component names must not differ from the file that you're importing from" or "There is a translation string X in the app code that doesn't show up in the translations file" or "Nesting depth inside of components shouldn't exceed X levels and length shouldn't exceed Y lines" or "Don't use Tailwind class names for colors, here's a branded list that you can use: X, Y, Z" in addition to a TypeScript linter setup with recommended rules and a bunch of stuff for back end code.

Ofc none of those fully eliminate all risks, but still seem like a sane thing to have, regardless if you use AI or not.


Generally speaking, with humans there's more guardrails & responsibility around letting someone run while in an organization.

Even if you have a very smart new hire, it would be irresponsible/reckless as a manager to just give them all the production keys after a once-over and say "here's some tasks I want done, I'll check back at the end of the day when I come back".

If something bad happened, no doubt upper management would blame the human(s) and lecture about risk.

AI is a wonderful tool, but that's why giving an AI coding tool the keys and terminal powers and telling it go do stuff while I grab lunch is kind of scary. Seems like living a few steps away from the edge of a fuck-up. So yeah... there needs to be enforceable guardrails and fail-safes outside of the context / agent.


The bright side is that it should eventually be technically feasible to create much more powerful and effective guardrails around neural nets. At the end of the day, we have full access to the machine running the code, whereas we can't exactly go around sticking electrodes into everyone's brains, and even "just" constant monitoring is prohibitively expensive for most human work. The bad news is that we might be decades away from an understanding of how to create useful guardrails around AI, and AI is doing stuff now.

Precisely, while LLMs fail at complexity, DSLs can represent thise divide-and-conquer intermediate levels to provide the most overall value and with good accuracy. LLMs should make it easier to build DSLs themselves and to validate their translating code. The onus then is on the intelligent agent to identify and design those DSLs. This would require the true and deep understanding of the domain and an ability to synthesize, abstract and to codify it. I predict this will be the future job of today's programmer, quite a bit more complicated than what is today, requiring wider range of qualities and skills, and pushing those specializing in coding-only to irrelevance.

Once AI improves its cost/error ratio enough the systems you are suggesting for humans will work here also. Maybe Claude/OpenAI will be pair programming and Gemini reviewing the code.

> Once AI improves

That's exactly the problematic mentality. Putting everything in a black box and then saying "problem solved; oh it didn't work? well maybe in the future when we have more training data!"

We're suffering from black-box disease and it's an epidemic.


The training data: Entirety of internet and every single book we could put our hands on "Surely we can just somehow give it more and it will be better!"

Also once people stop cargo-culting $trendy_dev_pattern it'll get less impactful.

Every time something new the same thing happen, people start exploring by putting it absolutely everywhere, no matter what makes sense. Add in huge amount of cash VCs don't know what to spend it on, and you end up with solutions galore but none of them solving any real problems.

Microservices is a good example of previous $trendy_dev_pattern that is now cooling down, and people are starting to at least ask the question "Do we need microservices here actually?" before design and implementation, something that has been lacking since it became a trendy thing. I'm sure the same will happen with LLMs eventually.


For that to work the error rate would have to be very low. Potentially lower than is fundamentally possible with the architecture.

And you’d have to assume that the errors LLMs make are random and independent.


As I get older I'm realizing a lot of things in this world don't get better. Some do, to be fair, but some don't.

Why does this conflict? Faster people doesn't negate the requirement for building systems that maintain safety in the face of errors.

> but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation.

Sure but the point is you use it when you don't have the same simple flow. Fixed coding for clear issues, fall back afterwards.


This will drive development of systems that error-correct at scale, and orchestration of agents that feed back into those systems at different levels of abstraction to compensate for those modes of failure.

An AI software company will have to have a hierarchy of different agents, some of them writing code, some of them doing QA, some of them doing coordination and management, others taking into account the marketing angles, and so on, and you can emulate the role of a wide variety of users and skill levels all the way through to CEO level considerations. It'd even be beneficial to strategize by emulating board members, the competitors, and take into account market data with a team of emulated quants, and so on.

Right now we use a handful of locally competent agents that augment the performance of single tasks, and we direct them within different frameworks, ranging from vibecoding to diligent, disciplined use of DSL specs and limiting the space of possible errors. Over the next decade, there will be agent frameworks for all sorts of roles, with supporting software and orchestration tools that allow you to use AI with confidence. It won't be one-shot prompts with 15% hallucination rates, but a suite of agents that validate and verify at every stage, following systematic problem solving and domain modeling rules based on the same processes and systems that humans use.

We've got decades worth of product development even if AI frontier model capabilities were to stall out at current levels. To all appearances, though, we're getting far more bang for our buck and progress is still accelerating, and the rate of improvement is still accelerating, so we may get AI so competent that the notion of these extensive agent frameworks for reliable AI companies will end up being as mismatched with market realities as those giant suitcase portable phones, or integrated car phones.


Well I don't see why that's a problem when LLMs are designed to replace the human part, not the machine part. You still need the exact same guardrails that were developed for human behavior because they are trained on human behavior.

Yep the further we go from highly constrained applications the riskier it'll always be

I was wondering if the need more analysis. Because I receive this response a lot, people say yeah AI do things wrong sometimes, but humans do that too, so what? Or humans are mechanism for turning natural language into formal language and they get things wrong sometimes (as if you can't never write a program that is clear and does what it should be doing) so be easy on AI. Where does this come from? It feels as if it something psychological.

There's this huge wave of "don't anthropomorphize AI" but LLMs are much easier to understand when you think of them in terms of human psychology rather than a program. Again and again, HackerNews is shocked that AI displays human-like behavior, and then chooses not to see that.

> LLMs are much easier to understand when you think of them in terms of human psychology

Are they? You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future. You can't have the same expectation from an LLM.

The only thing you should expect from an LLM is that its output is non-deterministic. You can expect the same from a human, of course, but you can fire a human if they keep making (the same) mistake(s).


While the slowness of learning of all ML is absolutely something I recognise, what you describe here:

> You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future.

Wildly varies depending on the human.

Me? I wish I could learn German from a handful of examples. My embarrassment at my mistakes isn't enough to make it click faster, and it's not simply a matter of motivation here: back when I was commuting 80 minutes each way each day, I would fill the commute with German (app) lessons and (double-speed) podcasts. As the Germans themselves will sometimes say: Deutsche Sprache, schwere Sprache.

There's been a few programmers I've worked with who were absolutely certain they knew better than me, when they provably didn't.

One, they insisted a start-up process in a mobile app couldn't be improved, I turned it from a 20 minute task to a 200ms task by the next day's standup, but they never at any point showed any interest in improving or learning. (Other problems they demonstrated included not knowing or caring how to use automated reference counting, why copy-pasting class files instead of subclassing cannot be excused by the presence of "private" that could just have been replaced with "public", and casually saying that he had been fired from his previous job and blaming this on personalities without any awareness that even if true he was still displaying personality conflicts with everyone around him).

Another, complaining about too many views on screen, wouldn't even let me speak, threatened to end the call when I tried to say anything, even though I had already demonstrated before the call that even several thousand (20k?) widgets on-screen at the same time would still run at 60fps and they were complaining about order-of 100 widgets.


> Wildly varies depending on the human.

Sure. And the situation.

But the difference is, all humans are capable of it, whether or not they have the tools to exercise that capability in any given situation.

No LLM is capable of it*.

* Where "it" is "recognizing they made a mistake in real time and learning from it on their own", as distinct from "having their human handlers recognize they made 20k mistakes after the fact and running a new training cycle to try to reduce that number (while also introducing fun new kinds of mistakes)".


> But the difference is, all humans are capable of it, whether or not they have the tools to exercise that capability in any given situation.

When they don't have the tools to exercise that capability, it's a distinction without any practical impact.

> Where "it" is "recognizing they made a mistake in real time and learning from it on their own"

"Learn" I agree. But as an immediate output, weirdly not always: they can sometimes recognise they made a mistake and correct it.


> When they don't have the tools to exercise that capability, it's a distinction without any practical impact.

It has huge practical impact.

If a human doesn't currently have the tools to exercise the capability, you can help them get those.

This is especially true when the tools in question are things like "enough time to actually think about their work, rather than being forced to rush through everything" or "enough mental energy in the day to be able to process and learn, because you're not being kept constantly on the edge of a breakdown." Or "the flexibility to screw up once in a while without getting fired." Now, a lot of managers refuse to give their subordinates those tools, but that doesn't mean that there's no practical impact. It means that they're bad managers and awful human beings.

An LLM will just always be nondeterministic. If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

> they can sometimes recognise they made a mistake and correct it.

...And other times, they "recognize they made a mistake" when they actually had it right, and "correct it" to something wrong.

"Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.


> you can help them get those.

A generic "you" might, I personally don't have that skill.

But then, I've never been a manager.

> An LLM will just always be nondeterministic.

This is not relevant, humans are also nondeterministic. At least practically speaking, theoretically doesn't matter so much as we can't duplicate our brains and test us 10 times on the same exact input without each previous input affecting the next one.

> If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

Yes there is, this is what "prompt engineering" (even if "engineering" isn't the right word) is all about: https://en.wikipedia.org/wiki/Prompt_engineering

> "Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.

Yes. This means that anthropomorphising them leads to a useful prediction.

For similar reasons, I use words like "please" and "thank you" with these things, even though I don't actually expect these models to have constructed anything resembling a real human emotional qualia within them — humans do better when praised, therefore I have reason to expect that any machine that has learned to copy human behaviour will likely also do better when praised.


> This is not relevant, humans are also nondeterministic.

I mean, I suppose one can technically say that, but, as I was very clearly describing, humans both err in predictable ways, and can be taught not to err. Humans are not nondeterministic in anything like the same way LLMs are. LLMs will just always have some percentage chance of giving you confidently wrong answers. Because they do not actually "know" anything. They produce reasonable-sounding text.

> Yes there is

...And no matter how well you engineer your prompts, you cannot guarantee that the LLM's outputs will be any less confidently wrong. You can probably make some improvements. You can hope that your "prompt engineering" has some meaningful benefit. But not only is that nowhere near guaranteed, every time the models are updated, you run a very high risk that your "prompt engineering" tricks will completely stop working.

None of that is true with humans. Human fallibility is wildly different than LLM fallibility, is very-well-understood overall, and is highly and predictably mitigable.


they can be also told they make a mistake and correct themselves making the same mistake again.

> Are they?

Yes, hugely. Just assume it's like a random person from some specific pool with certain instructions you've just called on the phone. The idea that you then call a fresh person if you call back is easy to understand.


I'm genuinely wondering if your parent comment is correct and the only reason we don't see the behaviour you describe, IE, learning and growth is because of how we do context windows, they're functionally equivalent to someone who has short term memory loss, think Drew Barrymore's character or one of the people in that facility she ends up in in the film 50 first dates.

Their internal state moves them to a place where they "really intend" to help or change their behaviour, a lot of what I see is really consistent with that, and then they just, forget.


I think it's a fundamental limitation of how context works. Inputting information as context is only ever context; the LLM isn't going to "learn" any meaningful lesson from it.

You can only put information in context; it struggles learning lessons/wisdom


Not only, but also. The L in ML is very slow. (By example count required, not wall-clock).

On in-use learning, they act like the failure mode of "we have outsourced to a consultant that gives us a completely different fresh graduate for every ticket, of course they didn't learn what the last one you talked to learned".

Within any given task, the AI have anthropomorphised themselves because they're copying humans' outputs. That the models model the outputs with only a best-guess as to the interior system that generates those outputs, is going to make it useful, but not perfect, to also anthropomorphise the models.

The question is, how "not perfect" exactly? Is it going to be like early Diffusion image generators with the psychological equivalent of obvious Cronenberg bodies? Or the current ones where you have to hunt for clues and miss it on a quick glance?


No, the idea is just stupid.

I just don't understand how anyone who actually uses the models all the time can think this.

The current models themselves can even explain what a stupid idea this is.


Obviously they aren't actually people so there are many low hanging differences. But consider this: Using words like please and thank you get better results out of LLMs. This is completely counterintuitive if you treat LLMs like any other machine, because no other machine behaves like that. But it's very intuitive if you approach them with thinking informed by human psychology.

> You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future.

Have you talked to a human? Like, ever?


Have you?

One day you wake up, and find that you now need to negotiate with your toaster. Flatter it maybe. Lie to it about the urgency of your task to overcome some new emotional inertia that it has suddenly developed.

Only toast can save us now, you yell into the toaster, just to get on with your day. You complain about this odd new state of things to your coworkers and peers, who like yourself are in fact expert toaster-engineers. This is fine they say, this is good.

Toasters need not reliably make toast, they say with a chuckle, it's very old fashioned to think this way. Your new toaster is a good toaster, not some badly misbehaving mechanism. A good, fine, completely normal toaster. Pay it compliments, they say, ask it nicely. Just explain in simple terms why you deserve to have toast, and if from time to time you still don't get any, then where's the harm in this? It's really much better than it was before


It reminds me of the start of Ubik[1], where one of the protagonists has to argue with their subscription-based apartment door. Given also the theme of AI allucinations, that book has become even more prescient than when it was written.

[1]https://en.wikipedia.org/wiki/Ubik



This comparison is extremely silly. LLMs solve reliably entire classes of problems that are impossible to solve otherwise. For example, show me Russian <-> Japanese translation software that doesn't use AI and comes anywhere close to the performance and reliability of LLMs. "Please close the castle when leaving the office". "I got my wisdom carrot extracted". "He's pregnant." This was the level of machine translation from English before AI, from Japanese it was usually pure garbage.

> LLMs solve reliably entire classes of problems that are impossible to solve otherwise.

Is it really ok to have to negotiate with a toaster if it additionally works as a piano and a phone? I think not. The first step is admitting there is obviously a problem, afterwards you can think of ways to adapt.

FTR, I'm very much in favor of AI, but my enthusiasm especially for LLMs isn't unconditional. If this kind of madness is really the price of working with it in the current form, then we probably need to consider pivoting towards smaller purpose-built LMs and abandoning the "do everything" approach.


We are there in the small already. My old TV had a receiver and a pair of external speakers connected to it. I could decrease and increase the receiver volume with its extra remote. Two buttons, up and down. This was with an additional remote that came with the receiver.

Nowadays, a more capable 5.1 speaker receiver is connected to the TV.

There is only one remote, for both. To increase or decreae the volume after starting the TV now, I have to:

1. wait a few seconds while the internal speakers in the TV starts playing sound

2. the receiver and TV connect to each other, audio switches over to receiver

3. wait a few seconds

4. the TV channel (or Netflix or whatever) switches over to the receiver welcome screen. Audio stops playing, but audio is now switched over to the receiver, but there is no indication of what volume the receiver is set to. It's set to whatever it was last time it was used. It could be level 0, it could be level 100 or anything in between.

5. switch back to TV channel or Netflix. That's at a minimum 3 presses on the remote. (MENU, DOWN, ENTER) or (MENU, DOWN, LEFT, LEFT, ENTER) for instance. Don't press too fast, you have to wait ever so slightly between presses or they won't register.

6. Sorry, you were too impatient and fast when you switched back to TV, the receiver wants to show you its welcome screen again.

7. switch back to TV channel or Netflix. That's at a minimum 3 presses on the remote. (MENU, DOWN, ENTER) or (MENU, DOWN, LEFT, LEFT, ENTER) for instance. Don't press too fast, you have to wait ever so slightly between presses or they won't register.

8. Now you can change volume up and down. Very, very slowly. Hope it's not at night and you don't want to wake anyone up.


Yep, it's a decent analogy: Giving up actual (user) control for the sake of having 1 controller. There's a type of person that finds it convenient. And another type that finds it a sloppy piss-poor interface that isn't showing off any decent engineering or design. At some point, many technologists started to fall into the first category? It's one thing to tolerate a bad situation due to lack of alternatives, but very different to slip into thinking that it must be the pinnacle of engineering excellence.

Around now some wit usually asks if the luddites also want to build circuits from scratch or allocate memory manually? Whatever, you can use a garbage collector! Point is that good technologists will typically give up control tactically, not as a pure reflex, and usually to predictable subsystems that are reliable, are well-understood, have clear boundaries and tolerances.


> predictable subsystems that are reliable, are well-understood, have clear boundaries and tolerances

I'd add with reliability, boundaries, and tolerances within the necessary values.

The problem with the TV remote is that nobody has given a damn about ergonomic needs for decades. The system is reliable, well understood, and has well known boundaries and tolerances; those are just completely outside of the requirements of the problem domain.

But I guess that's a completely off-topic tangent. LLMs fail much earlier.


>LLMs solve reliably entire classes of problems that are impossible to solve otherwise

Great! Agreed! So we're going to restrict LLMs to those classes of problems, right? And not invest trillions of dollars into the infrastructure, because these fields are only billion dollar problems. Right? Right!?



Remember: a phone is a phone, you're not supposed to browse the internet on it.

Not if 1% of the time it turns into a pair of scissors.

> LLMs solve reliably entire classes of problems that are impossible to solve otherwise. For example, [...] Russian <-> Japanese translation

Great! Name another?


I admit Grok is capable of praising Elon Musk way more than any human intelligence could.

BUTTER ROBOT: What is my purpose?

RICK: You pass butter.

BUTTER ROBOT: ... Oh my God.

RICK: Yeah, welcome to the club, pal.

https://youtube.com/watch?v=X7HmltUWXgs


Not surprising to see this so downvoted but it's very true, it's a great first order approximation and yet users here will be continually surprised they act like people.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: