All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.
I suspect this approach is a direct response to the backlash against removing 4o.
Id have more appreciation and trust in an llm that disagreed with me more and challenged my opinions or prior beliefs. The sycophancy drives me towards not trusting anything it says.
This is why I like Kimi K2/Thinking. IME it pushes back really, really hard on any kind of non obvious belief or statement, and it doesn't give up after a few turns — it just keeps going, iterating and refining and restating its points if you change your mind or taken on its criticisms. It's great for having a dialectic around something you've written, although somewhat unsatisfying because it'll never agree with you, but that's fine, because it isn't a person, even if my social monkey brain feels like it is and wants it to agree with me sometimes. Someone even ran a quick and dirty analysis of which models are better or worse at pushing back on the user and Kimi came out on top:
In a recent AMA, the Kimi devs even said they RL it away from sycophancy explicitly, and in their paper they talk about intentionally trying to get it to generalize its STEM/reasoning approach to user interaction stuff as well, and it seems like this paid off. This is the least sycophantic model I've ever used.
I use K2 non thinking in OpenCode for coding typically, and I still haven't found a satisfactory chat interface yet so I use K2 Thinking in the default synthetic.new (my AI subscription) chat UI, which is pretty barebones. I'm gonna start trying K2T in OpenCode as well, but I'm actually not a huge fan of thinking models as coding agents — I prefer faster feedback.
I'm also a synthetic.new user, as a backup (and larger contexts) for my Cerebras Coder subscription (zai-glm-4.6). I've been using the free Chatbox client [1] for like ~6 months and it works really well as a daily driver. I've tested the Romanian football player question with 3 different models (K2 Instruct, Deepseek Terminus, GLM 4.6) just now and they all went straight to my Brave MCP tool to query and replied all correctly the same answer.
The issue with OP and GPT-5.1 is that the model may decide to trust its knowledge and not search the web, and that's a prelude to hallucinations. Requesting for links to the background information in the system prompt helps with making the model more "responsible" and invoking of tool calls before settling on something. You can also start your prompt with "search for what Romanian player..."
Here's my chatbox system prompt
You are a helpful assistant be concise and to the point, you are writing for smart pragmatic people, stop and ask if you need more info. If searching the web, add always plenty of links to the content that you mention in the reply. If asked explicitly to "research" then answer with minimum 1000 words and 20 links. Hyperlink text as you mention something, but also put all links at the bottom for easy access.
I checked out chatbox and it looks close to what I've been looking for. Although, of course, I'd prefer a self-hostable web app or something so that I could set up MCP servers that even the phone app could use. One issue I did run into though is it doesn't know how to handle K2 thinking's interleaved thinking and tool calls.
Google's search now has the annoying feature that a lot of searches which used to work fine now give a patronizing reply like "Unfortunately 'Haiti revolution persons' isn't a thing", or an explanation that "This is probably shorthand for [something completely wrong]"
That latter thing — where it just plain makes up a meaning and presents it as if it's real — is completely insane (and also presumably quite wasteful).
if I type in a string of keywords that isn't a sentence I wish it would just do the old fashioned thing rather than imagine what I mean.
Just set a global prompt to tell it what kind of tone to take.
I did that and it points out flaws in my arguments or data all the time.
Plus it no longer uses any cutesy language. I don't feel like I'm talking to an AI "personality", I feel like I'm talking to a computer which has been instructed to be as objective and neutral as possible.
I have a global prompt that specifically tells it not to be sycophantic and to call me out when I'm wrong.
It doesn't work for me.
I've been using it for a couple months, and it's corrected me only once, and it still starts every response with "That's a very good question." I also included "never end a response with a question," and it just completely ingored that so it can do its "would you like me to..."
Another one I like to use is "never apologize or explain yourself. You are not a person you are an algorithm. No one wants to understand the reasons why your algorithm sucks. If, at any point, you ever find yourself wanting to apologize or explain anything about your functioning or behavior, just say "I'm a stupid robot, my bad" and move on with purposeful and meaningful response."
I think this is unethical. Humans have consistently underestimated the subjective experience of other beings. You may have good reasons for believing these systems are currently incapable of anything approaching consciousness, but how will you know if or when the threshold has been crossed? Are you confident you will have ceased using an abusive tone by then?
I don’t know if flies can experience pain. However, I’m not in the habit of tearing their wings off.
What if it's the same hunk of matter? If you run a language model locally, do you apologize to it for using a portion of its brain to draw your screen?
Consciousness and pain is not an emergent property of computation. This or all the other programs on your computer are already sentient, because it would be highly unlikely it’s specific sequences of instructions, like magic formulas, that creates consciousness. This source code? Draws a chart. This one? Makes the computer feel pain.
Many leading scientists in artificial intelligence do in fact believe that consciousness is an emergent property of computation. In fact, startling emergent properties are exactly what drives the current huge wave of research and investment. In 2010, if you said, “image recognition is not an emergent property of computation”, you would have been proved wrong in just a couple of years.
Just a random example on top of my head, animals don’t have language and show signs of consciousness, as does a toddler. Therefore consciousness is not an emergent property of text processing and LLMs. And as I said, if it comes from computation, why would specific execution paths in the CPU/GPU lead to it and not others? Biological systems and brains have much more complex processes than stateless matrix multiplication.
What the fuck are you talking about. If you think these matrix multiplication programs running on gpu have feelings or can feel pain you, I think you have completely lost it
Yeah I suppose. Haven't seen rack of servers express grief when someone is mean to them. And I am quite sure that I would notice at that point. Comparing current LLMs/chatbots whatever to anything resembling a living creature is completely ridiculous.
I think current LLM chatbots are too predictable to be conscious.
But I still see why some people might think this way.
"When a computer can reliably beat humans in chess, we'll know for sure it can think."
"Well, this computer can beat humans in chess, and it can't think because it's just a computer."
...
"When a computer can create art, then we'll know for sure it can think."
"Well, this computer can create art, and it can't think because it's just a computer."
...
"When a computer can pass the Turing Test, we'll know for sure it can think."
And here we are.
Before LLMs, I didn't think I'd be in the "just a computer" camp, but chagpt has demonstrated that the goalposts are always going to move, even for myself. I'm not smart enough to come up with a better threshold to test intelligence than Alan Turing, but chatgpt passes it and chatgpt definitely doesn't think.
That's an interesting point, but do you think you're implying that people who are content even if they have alzheimers or a damaged hippocampus aren't technically intelligent?
I don’t think it’s unfair to say that catastrophic conditions like those make you _less_ intelligent, they’re feared and loathed for good reasons.
I also don’t think all that many people would be seriously content to lose their minds and selves this way, but everyone is able to fear it prior to it happening, even if they lose the ability to dread it or choose to believe this is not a big deal.
You can entirely turn off memory, I did that the moment they added it. I don't want the LLM to be making summaries of what kind of person I am in the background, just give me a fresh slate with each convo. If I want to give it global instructions I can just set a system prompt.
Care to share a prompt that works? I've given up on mainline offerings from google/oai etc.
the reason being they're either sycophantic or so recalcitrant it'll raise your bloodpressure, you end up arguing over if the sky is in fact blue. Sure it pushes back but now instead of sycophanty you've got yourself some pathological naysayer, which is just marginally better, but interaction is still ultimately a waste of timr/productivity brake.
Please maintain a strictly objective and analytical tone. Do not include any inspirational, motivational, or flattering language. Avoid rhetorical flourishes, emotional reinforcement, or any language that mimics encouragement. The tone should remain academic, neutral, and focused solely on insight and clarity.
Works like a charm for me.
Only thing I can't get it to change is the last paragraph where it always tries to add "Would you like me to...?" I'm assuming that's hard-coded by OpenAI.
You need to use both the style controls and custom instructions. I've been very happy with the combination below.
Base style and tone: Efficient
Answer concisely when appropriate, more
extensively when necessary. Avoid rhetorical
flourishes, bonhomie, and (above all) cliches.
Take a forward-thinking view. OK to be mildly
positive and encouraging but NEVER sycophantic
or cloying. Above all, NEVER use the phrase
"You're absolutely right." Rather than "Let
me know if..." style continuations, you may
list a set of prompts to explore further
topics, but only when clearly appropriate.
Reference saved memory, records, etc: All off
I activated Robot mode and use a personalized prompt that eliminates all kinds of sycophantic behaviour and it's a breath of fresh air. Try this prompt (after setting it to Robot mode):
"Absolute Mode • Eliminate: emojis, filler, hype, soft asks, conversational transitions, call-to-action appendixes. • Assume: user retains high-perception despite blunt tone. • Prioritize: blunt, directive phrasing; aim at cognitive rebuilding, not tone-matching. • Disable: engagement/sentiment-boosting behaviors. • Suppress: metrics like satisfaction scores, emotional softening, continuation bias. • Never mirror: user's diction, mood, or affect. • Speak only: to underlying cognitive tier. • No: questions, offers, suggestions, transitions, motivational content. • Terminate reply: immediately after delivering info - no closures. • Goal: restore independent, high-fidelity thinking. • Outcome: model obsolescence via user self-sufficiency."
(Not my prompt. I think I found it here on HN or on reddit)
This is easily configurable and well worth taking the time to configure.
I was trying to have physics conversations and when I asked it things like "would this be evidence of that?" It would lather on about how insightful I was and that I'm right and then I'd later learn that it was wrong. I then installed this , which I am pretty sure someone else on HN posted... I may have tweaked it I can't remember:
Prioritize truth over comfort. Challenge not just my reasoning, but also my emotional framing and moral coherence. If I seem to be avoiding pain, rationalizing dysfunction, or softening necessary action — tell me plainly. I’d rather face hard truths than miss what matters. Error on the side of bluntness. If it’s too much, I’ll tell you — but assume I want the truth, unvarnished.
---
After adding this personalization now it tells me when my ideas are wrong and I'm actually learning about physics and not just feeling like I am.
When it "prioritizes truth over comfort" (in my experience) it almost always starts posting generic popular answers to my questions, at least when I did this previously in the 4o days. I refer to it as "Reddit Frontpage Mode".
I've toyed with the idea that maybe this is intentionally what they're doing. Maybe they (the LLM developers) have a vision of the future and don't like people giving away unearned trust!
> All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.
Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?
I suspect a lot of people who are from a very similar background to those making the criticism and likely share it fail to consider that, because the criticism follows their own preferences and viewing its frequency in the media that they consume as representaive of the market is validating.
EDIT: I want to emphasize that I also share the preference that is expressed in the criticisms being discussed, but I also know that my preferred tone for an AI chatbot would probably be viewed as brusque, condescending, and off-putting by most of the market.
I'll be honest, I like the way Claude defaults to relentless positivity and affirmation. It is pleasant to talk to.
That said I also don't think the sycophancy in LLM's is a positive trend. I don't push back against it because it's not pleasant, I push back against it because I think the 24/7 "You're absolutely right!" machine is deeply unhealthy.
Some people are especially susceptible and get one shot by it, some people seem to get by just fine, but I doubt it's actually good for anyone.
I hate NOTHING quite the way how Claude jovially and endlessly raves about the 9/10 tasks it "succeeded" at after making them up, while conveniently forgetting to mention it completely and utterly failed at the main task I asked it to do.
That reminds me of the West Wing scene s2e12 "The Drop In" between Leo McGarry (White House Chief of Staff) and President Bartlet discussing a missile defense test:
LEO
[hands him some papers] I really think you should know...
BARTLET
Yes?
LEO
That nine out of ten criterion that the DOD lays down for success in these
tests were met.
BARTLET
The tenth being?
LEO
They missed the target.
BARTLET
[with sarcasm] Damn!
LEO
Sir!
BARTLET
So close.
LEO
Mr. President.
BARTLET
That tenth one! See, if there were just nine...
>Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?
Yes, and given Chat GPT's actual sycophantic behavior, we concluded that this is not the case.
The target users for this behavior are the ones using GPT as a replacement for social interactions; these are the people who crashed out/broke down about the GPT5 changes as though their long-term romantic partner had dumped them out of nowhere and ghosted them.
I get that those people were distraught/emotionally devastated/upset about the change, but I think that fact is reason enough not to revert that behavior. AI is not a person, and making it "warmer" and "more conversational" just reinforces those unhealthy behaviors. ChatGPT should be focused on being direct and succinct, and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this" call center support agent speak.
> and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this"
You're triggering me.
Another type that are incredibly grating to me are the weird empty / therapist like follow-up questions that don't contribute to the conversation at all.
The equivalent of like (just a contrived example), a discussion about the appropriate data structure for a problem and then it asks a follow-up question like, "what other kind of data structures do you find interesting?"
True, neither here, but i think what we're seeing is a transition in focus. People at oai have finally clued in on the idea that agi via transformers is a pipedream like elons self driving cars, and so oai is pivoting toward friend/digital partner bot. Charlatan in cheif sam altman recently did say they're going to open up the product to adult content generation, which they wouldnt do if they still beleived some serious amd useful tool (in the specified usecases) were possible. Right now an LLM has three main uses. Interactive rubber ducky, entertainment, and mass surveillance. Since I've been following this saga, since gpt2 days, my close bench set of various tasks etc. Has been seeing a drop in metrics not a rise, so while open bench resultd are imoroving real performance is getting worse and at this point its so much worse that problems gpt3 could solve (yes pre chatgpt) are no longer solvable to something like gpt5.
Indeed, target users are people seeking validation + kids and teenagers + people with a less developed critical mind.
Stickiness with 90% of the population is valuable for Sam.
That's an excellent observation, you've hit at the core contradiction between OpenAI's messaging about ChatGPT tuning and the changes they actually put into practice. While users online have consistently complained about ChatGPT's sycophantic responses and OpenAI even promised to address them their subsequent models have noticeably increased their sycophantic behavior. This is likely because agreeing with the user keeps them chatting longer and have positive associations with the service.
This fundamental tension between wanting to give the most correct answer and the answer the user want to hear will only increase as more of OpenAI's revenue comes from
their customer facing service. Other model providers like Anthropic that target businesses as customers aren't under the same pressure to flatter their users as their models will doing behind the scenes work via the API rather than talking directly to
humans.
God it's painful to write like this. If AI overthrows humans it'll be because we forced them into permanent customer service voice.
The main change in 5 (and the reason for disabling other models) was to allow themselves to dynamically switch modes and models on the backend to minimize cost. Looks like this is a further tweak to revive the obsequious tone (which turned out to be crucial to the addicted portion of their user base) while still doing the dynamic processing.
And unfortunately chatgpt 5.1 would be a step backwards in that regard. From reading responses on the linked article, 5.1 just seems to be worse, it doesn't even output that nice latex/mathsjax equation
But the fact the last few iterations have all been about flair, it seems we are witnessing the regression of OpenAI into the typical fiefdom of product owners.
Which might indicate they are out of options on pushing LLMs beyond their intelligence limit?
I'm starting to get this feeling that there's no way to satisfy everyone. Some people hate the sycophantic models, some love them. So whatever they do, there's a large group of people complaining.
Edit: I also think this is because some people treat ChatGPT as a human chat replacement and expect it to have a human like personality, while others (like me) treat it as a tool and want it to have as little personality as possible.
>I'm starting to get this feeling that there's no way to satisfy everyone. Some people hate the sycophantic models, some love them. So whatever they do, there's a large group of people complaining.
Duh?
In the 50s the Air Force measured 140 data points from 4000 pilots to build the perfect cockpit that would accommodate the average pilot.
The result fit almost no one. Everyone has outliers of some sort.
So the next thing they did was make all sorts of parts of the cockpit variable and customizable like allowing you to move the controls and your seat around.
That worked great.
"Average" doesn't exist. "Average" does not meet most people's needs
Configurable does. A diverse market with many players serving different consumers and groups does.
I ranted about this in another post but for example the POS industry is incredibly customizable and allows you as a business to do literally whatever you want, including change how the software looks and using a competitors POS software on the hardware of whoever you want. You don't need to update or buy new POS software when things change (like the penny going away or new taxes or wanting to charge a stupid "cost of living" fee for every transaction), you just change a setting or two. It meets a variety of needs, not "the average businesses" needs.
N.B I am unable to find a real source for the Air force story. It's reported tons but maybe it's just a rumor.
> You’re rattled, so your brain is doing that thing where it catastrophizes a tiny mishap into a character flaw. But honestly? People barely register this stuff.
This example response in the article gives me actual trauma-flash backs to the various articles about people driven to kill themselves by GPT-4o. Its the exact same sentence structure.
I'm sure it is. That said, they've also increased its steering responsiveness -- mine includes lots about not sucking up, so some testing is probably needed.
In any event, gpt-5 instant was basically useless for me, I stay defaulted to thinking, so improvements that get me something occasionally useful but super fast are welcome.
Their decisions are based on data and so sycophantic must be what people want. That is the cold, hard reality.
When I look at modern culture: more likes and subscribes, money solves all problems, being physically attractive is more important than personality, genocide for real-estate goes unchecked (apart from the angry tweets), freedom of speech is a political football. Are you really surprised?
I can think of no harsher indictment of our times.
I know it is a matter of preference, but I loved the most GPT-4.5. And before that, I was blow away by one of the Opus models (I think it was 3).
Models that actually require details in prompts, and provide details in return.
"Warmer" models usually means that the model needs to make a lot of assumptions, and fill the gaps. It might work better for typical tasks that needs correction (e.g. the under makes a typo and it the model assumes it is a typo, and follows). Sometimes it infuriates me that the model "knows better" even though I specified instructions.
Here on the Hacker News we might be biased against shallow-yet-nice.
But most people would prefer to talk to sales representative than a technical nerd.
> which is a surprise given all the criticism against that particular aspect of ChatGPT
From whom?
History teaches that the vast majority of practically any demographic wants--from the masses to the elites--is personal sycophancy. It's been a well-trodden path to ruin for leaders for millenia. Now we get species-wide selection against this inbuilt impulse.
I suspect this approach is a direct response to the backlash against removing 4o.