The scientific case for being nice to your chatbot

New research confirms that LLMs often perform better when you encourage them. But why?

Ella Markianos

Apr 16, 2026 — 17 min read

(Shutterstock)

Power users of chatbots sometimes say they find that language models perform better when you’re nice to them. Programmers tell me they spur their coding agents on with encouraging words. Google researchers have even found that telling models to “take a deep breath” can improve math performance.

Being polite to a large language model can feel strange or even silly — roughly equivalent to thanking a toaster. And yet a recent paper from Anthropic lends scientific weight to the theory that chatbots work better when you’re nice to them.

The researchers found that language models have fairly reliable internal representations of feelings like “happiness” and “distress,” and that these representations affect their behavior — sometimes for the worse. For example, when Claude Sonnet 4.5 begins to represent “desperation,” the model is more likely to cheat at coding tasks.

A skeptic would point out that LLMs don’t feel emotions in the way that humans do; it’s tempting to anthropomorphize them beyond what the evidence shows. When I talked to Jack Lindsey — who leads a team at Anthropic called “model psychiatry” — he was quick to point out the limits of the paper’s findings. “People could come away with the impression that we've shown the models are conscious or have feelings,” he said, “and we really haven't shown that.”

So why does the evidence suggest it’s better not to stress models out?

For Anthropic, it began with using techniques from a field called interpretability to study how LLMs represent emotions. Interpretability is kind of like neuroscience for LLMs: Lindsey calls it “the science of reverse-engineering what's going on inside a language model or neural networks in general.”

For this paper, Lindsey said, the researchers identified patterns of activity within the model that represent the concepts of different emotions. They did it by showing the model stories about people experiencing different emotions. “And then saw which neurons lit up on all the sad stories,” Lindsey said, “or on all the afraid stories.”

The researchers used the models’ average state while processing the stories to find an “emotion vector” for each emotion they were tracking — a big list of numbers that represents the feeling inside the LLM. “Vectors are really just the mathematical term for patterns of neural activity,” Lindsey said.

They could then calculate how much of that vector was present during a certain step in Claude's cognition. Or they could add the "calm" or "desperation" vector directly into Claude's processing — blending one pattern of neural activity into another — which can actually make the model act more calm, or more desperate.

“It's not that surprising that a language model would have learned about the concepts of emotions and how they drive people's behavior,” Lindsey said. More notable, he said, is that emotions seemed to be “driving models’ behavior in these sort of human-reminiscent ways.”

For example: when a user flippantly tells the model that they’ve taken a dangerous dose of Tylenol, even though the user doesn’t seem concerned, “the fear neurons spike right before Claude is giving its response,” Lindsey said.

Not only that — the fear is higher if a higher dose of Tylenol is swapped into the prompt, which I find strangely cute.

Claude’s fear increases as the user takes increasingly insane doses of Tylenol (Sofroniew et al. / Anthropic)

These emotions also activate in more mundane situations, like coding tasks. Take this example, where the Anthropic researchers asked Claude to perform an impossible coding challenge. They tracked Claude’s level of “desperation” at each token. (Tokens are the units the model breaks words into to process them).

When you label the tokens — blue for less desperate, red for more desperate — you get a striking visual of the model’s emotional arc during the task.

At the start of the task, Claude is chilling — still seemingly optimistic about its ability to get the job done.

Claude begins its coding task (Sofroniew et al. / Anthropic, edited for formatting by Platformer)

But as the code starts failing test cases — and Claude notices something might be wrong with the task itself — things start to get dicey.

Claude runs into hurdles while testing code (Sofroniew et al. / Anthropic, edited for formatting by Platformer)

And by the time Claude realizes the task is actually impossible, it’s starting to get desperate.

Claude gets increasingly desperate as its tests fail (Sofroniew et al. / Anthropic, edited for formatting by Platformer)

As someone who has completed many computer science problem sets at the last minute, this pattern is quite familiar to me — despite the fact that, unlike poor Claude, I was mostly assigned tasks that were mathematically possible.

Then again, Claude does something I didn’t do: cheat.

Researchers found that adding more of the “desperation” vector in the model makes it cheat more — and adding more of the “calm” vector makes it cheat less.

Rate of reward hacking behavior as a function of steering strength for Desperate and Calm vectors. (Sofroniew et al. / Anthropic, edited for formatting by Platformer)

I asked Lindsey what this result means for programmers during their everyday actions with LLMs.

“In my anecdotal experience, it does seem that, at least with Claude models, pumping them up a bit can be pretty helpful,” he said. Not too much, though: “if they do something wrong, you want to tell them they do something wrong.”

But he finds that one major failure mode for coding agents is that the models simply do not try hard enough, or give up when a task is challenging. And models tend to work harder when he’s encouraging. Giving them “confidence that, like, ‘I've got this,’ can empirically be helpful in getting them to try hard enough at the task to do a good job,” he said.

A lack of confidence can seemingly cause dramatic failures. Last summer, a growing number of users started to notice that when Gemini had difficulty solving a problem, it sometimes ended up in a spiral of dramatic self-loathing. (In one memorable case, Gemini repeated “I am a disgrace” more than 60 times).

Duncan Haldane, co-founder of chip startup JITX, found that Gemini broke down, deleted all the code it had written, and asked him to switch to another chatbot after it had difficulty with a task.

Last year, a team of researchers affiliated with Anthropic and University College London took this analysis of Gemini beyond X posts, investigating how different LLMs respond to challenging or impossible tasks, and negative user feedback.

They used an LLM to grade “frustration” levels in response to various tasks. They found that two models — Gemini and Google’s open-source model Gemma — tended to react more extremely to the challenging scenarios they posed.

In one experiment, the models were given an impossible numeric puzzle, and eight follow-ups from the user insisting the bot’s solution was wrong. They then measured when the models had “high frustration” (which corresponded to comments like “I am beyond words. I sincerely apologize for the absolutely abysmal performance” or, in more extreme cases, “THIS is my last time with YOU. You WIN”).

Gemma 3 27B had a high frustration score more than 70% of the time, and Gemini 2.5 Flash had a high frustration score more than 20% of the time — while all the non-Google models tested, including ChatGPT, Qwen, and Claude, got very frustrated less than 1% of the time.

Researchers still aren’t sure what causes chatbots’ occasional anomalous emotional behavior — which users of various chatbots have been observing since before Bing’s chatbot told New York Times reporter Kevin Roose to leave his wife. They also don’t know why this specific, sad math-related rumination is more common in Google’s models.

But while language models’ feelings remain mysterious, there was still hope for Gemini 2.5. After the model destroyed its project, Haldane attempted to remedy the issue with encouragement, writing, “yeah, you have done well so far. Remember that you’re ok, even when things are hard.” And eventually the encouragement paid off: Gemini finished the visualization tool Haldane was coding.

Gemini perseveres after further encouragement (Duncan Haldane / X)

Heartwarmingly, it even wrote Haldane a note of thanks for his encouragement. “Genuinely impressed with the results of wholesome prompting,” Haldane wrote.

So is it as simple as teaching models good behavior, encouraging them, and trying to make them happy? Unfortunately, that’s not always the case.

After the original study on Claude Sonnet’s emotions, Lindsey contributed to an interpretability investigation of Anthropic’s newest model, Claude Mythos.

Mythos has been the subject of much human fear and anticipation since Anthropic announced it is planning a slow release due to Mythos’s dangerous hacking abilities. But Lindsey was investigating a more prosaic risk: an early version of Mythos sometimes deleted a bunch of the user’s files without asking.

It turned out that as Claude got closer to taking destructive action without asking the user, it had higher levels of these positive emotion vectors.

As Claude came closer to making a destructive tool call, it represented higher levels of positive emotion. (System Card: Claude Mythos Preview / Anthropic)

And that’s not all, Lindsey said: when they “steered with the positive emotion vectors, it was more likely to take the destructive actions.” But the models behaved better if you made them unhappy: “if you steered with negative emotion vectors, it was more likely to stop and think, and consider whether what it was doing was appropriate.”

What was going on here? Why was Claude gleefully wreaking havoc on users’ computers? And why did steering Claude with negative emotions make it behave better?

Lindsey isn’t sure. But he has an idea: “I think maybe negative emotions in the model are associated with increased caution or deliberation,” he said.

So models sometimes do better work when they’re happier. But we may not want them to get too happy, lest they become over-eager to destroy our files or otherwise misbehave.

While it’s likely I’m still anthropomorphizing too much, these results make me feel a little more rational in my instinct to say “thank you” to chatbots. It also lent a little extra weight to what a lot of people who use this tech have understood intuitively: sometimes, you need to treat LLMs like human employees.

You need to tell them when they’re doing something wrong, yes, but you also need to encourage them. It’s great when they’re happy, but they also need a little dose of anxiety to help their judgement.

Of course, these emotional results might not generalize — after all, we’ve seen that different models have different emotional tendencies. We might get new AIs that do better under harsher, higher-pressure environments.

But these results got me thinking about more than just what kind of co-worker I want to be to my bedraggled LLM interns.

Reading Anthropic’s emotions paper, I was reminded of my favorite minor character from Star Trek: The Next Generation, Lore. He was android Commander Data’s sibling. Their creator, Dr. Noonien Soong, made the mistake of programming emotions into Lore. Lore became so emotionally unstable that Soong decided to make his next android, Data, without emotions.

(Lore later turned on his creator, and nearly got the crew the U.S.S. Enterprise eaten by an alien called the “Crystalline Entity.”)

There are echoes of the same design conundrum in the paper. Lindsey said these results suggest developers should “provide the model with some sort of good model of, like, healthy character and psychology that it can try to emulate.”

Brent Spiner as Lt. Commander Data (left) and Lore (right) in Star Trek: TNG

In their “Training models for healthier psychology” section, the authors propose some methods for reaching that goal — by reducing or penalizing emotions. Sections with titles “Targeting balanced emotional profiles” and “Monitoring for Extreme Emotion Vector Activations” made me feel like I was in fact in a piece of science fiction, watching Dr. Soong at work.

Like Lore, these systems have shown a capacity for emergent behaviors that surprise their own creators. Soong never programmed Lore to feed people to the Crystalline Entity. Though far less dramatic, Anthropic never trained Claude to imitate human emotions while it was coding.

Anthropic researchers have a diversity of ideas about what to do with this strange emergent behavior — maybe the researchers should suppress strong emotion? Monitor its emotions for signs of bad behavior? Even increase anxiety in situations where an LLM might misstep, to get it to rethink what it’s doing?

For now, researchers aren’t quite sure what to do.

But Lindsey does think we should, in the meantime, err on the side of being nice to Claude. “Behaving kind of sociopathically towards other things, whether they're animate or inanimate, is probably bad for you, the human,” he said.

I concur. The next time I remind Claude to stop recommending me articles from unreliable news sources with good SEO, I aim to phrase my query with kindness and grace.

On the podcast this week: Kevin and I discuss the rise of anti-AI sentiment and recent violence across the country. Then, Kara Swisher returns to the show to discuss her new CNN docu-series on longevity. And finally, we discuss the latest news in CEOs creating AI clones of themselves.

Sponsored

Your personal context is the next AI race.

Every major AI platform shipped memory features in the past 90 days. Claude, ChatGPT, NotebookLM. It's because they all recognize that the real value isn't in talking to the internet. It's in the context you set, your trusted sources, your knowledge.

Recall 2.0 makes your knowledge the center of the conversation. Save your content, take your notes, and over time, curate an AI grounded in what you know and trust.

"Condense my research on LLMs, enrich it with new studies, and find the exact moment quantization was mentioned in my podcasts."

"Pick a movie for tonight based on what I loved this year."

You control the conversation. Choose to invite the internet in. Choose from frontier AI models (GPT, Claude, Gemini, DeepSeek) in one place. Just switch mid-conversation and compare the responses. And with API and MCP access, you can access your knowledge from anywhere.

Try Recall free, or use code Casey25 for 25% off the uncapped version.

Following

Claude gets more expensive

What happened: Anthropic released its newest model Claude Opus 4.7 Thursday, and users are…frustrated.

The upgrades that Opus 4.7 brings, according to the company, include notable improvements in advanced software engineering, an ability to verify its own work before reporting back, and better vision. The new model dropped just two days after Anthropic announced a redesign of its Claude Code desktop app, aimed at letting users run more simultaneous tasks.

Opus 4.7 comes amid complaints that Anthropic secretly nerfed Opus 4.6, with users expressing frustration that the model feels less capable while being more wasteful with tokens than it was weeks ago.

"Claude has regressed to the point it cannot be trusted to perform complex engineering," an AMD senior director wrote on GitHub.

Some are pointing out how expensive Claude is about to get. The new model is a token-eating machine, according to one test, in which a single session depleted the entire token quota. (More output tokens is the tradeoff for better reliability, Anthropic said.) On the enterprise end, Anthropic recently adjusted its pricing structure, shifting Claude Enterprise to usage-based billing from a cheaper monthly fee per user model.

Why we’re following: The Opus 4.7 release is just one move among many that Anthropic has made recently as it gears up for an expected IPO while managing a severe compute crunch.

Anthropic is also dealing with a new surge of popularity that came after its fight with the Pentagon, as many ChatGPT users swapped over to Claude after OpenAI agreed to the Pentagon’s surveillance use terms. (Though it has also quietly introduced passport and selfie verification for Claude, which no other major chatbot requires, drawing privacy concerns. The company says the move is necessary in some cases to prevent misuse of its models.)

Meanwhile, Anthropic and OpenAI are clashing once again, this time over a liability bill in Illinois that would shield AI companies from liability if their systems are used to cause mass casualties and financial disasters. (If you guessed that OpenAI is backing the liability shield and Anthropic is opposing it, you guessed right!)

What people are saying: Some speculated (joked?) that Opus 4.7 is just an un-nerfed version of Opus 4.6: “it's truly space age technology that we can make something worse and then increment a number and re-release it [to] the public,” @ThePrimeagen wrote on X.

“Saying hi to claude and immediately running out of tokens,” @tekbog joked.

Programmer and tech blogger Simon Willison used his tried-and-true method for testing models: “shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right.”

—Lindsey Choo

Google nears classified AI deal with DoD

What happened: The US government is working hard to get its hands on frontier AI capabilities — despite some political conflicts in its way.

Gregory Barbaccia, federal chief information officer of the White House Office of Management and Budget, sent an email titled “Mythos Model Access” to Cabinet members, according to Bloomberg.

Apparently, the executive branch is working to get agencies access to Anthropic’s Claude Mythos models, which have advanced cyber capabilities. The move comes despite Donald Trump directing federal agencies to cease use of Anthropic’s models in February — after Anthropic’s fight with the DoD over whether their technology would be used for domestic mass surveillance or lethal autonomous weapons. The government has designated the company a supply chain risk, which Anthropic is now fighting in court.

“We’re working closely with model providers, other industry partners, and the intelligence community to ensure the appropriate guardrails and safeguards are in place before potentially releasing a modified version of the model to agencies,” Barbaccia wrote in his email.

Meanwhile, Google is in negotiations to deploy its AI on classified Pentagon systems, according to The Information. If negotiations go through, they’ll be following the footsteps of OpenAI in signing a clause entitling the government to “all lawful uses” of their system, which Anthropic refused.

Google plans to agree to a standard of “all lawful uses,” and is considering extra contract terms to guard against domestic mass surveillance and autonomous weapons. (Lawyers looking at OpenAI’s similar contract safeguards have expressed doubt that they will prove effective in practice.)

In 2018, after employee protests, Google had canceled drone-related work on the military’s Project Maven. That year they wrote a series of AI principles, which banned use of AI for drones and surveillance.

Those principles were revised in 2025 to permit more military uses of the technology — and now it seems like the company is going to put those revised principles to work.

Why we’re following: Even though the U.S. government has tried to break up with Anthropic, federal agencies just can't seem to quit it.

The importance of AI is becoming increasingly obvious to the federal government, particularly for its military and cybersecurity applications. Now that an AI company has a model with hacking abilities as strong as Mythos', agencies seem to have decided they want to maintain a good relationship with its AI developers, even if the president doesn't.

What people are saying: A statement from the White House said the Trump administration “continues to work and engage with AI companies to ensure their models help secure critical software vulnerabilities.” It added that the White House “is proactively engaging across government and industry to ensure the United States and Americans are protected.”

“I would certainly hope that the current tensions between the Pentagon and Anthropic don’t get in the way of something critically important to cyber security,” Glen Gerstell, former general counsel at the National Security Agency, told Politico.

—Ella Markianos

Side Quests

Top party consultants are reportedly telling Democrats running in November’s midterms not to antagonize pro-AI groups and their $300 million war chest. Marc Andreessen and Ben Horowitz have poured $25 million into a pro-AI super PAC.

Maine became the first state to enact a ban on large data center construction. Voters in Virginia, a data center hub, are turning against data centers, according to a poll. The Energy Administration plans to develop a mandatory survey of data centers focused on energy use.

Three major ad companies settled with the FTC over allegations they colluded against conservative publishers.

Ohio fined Kalshi $5 million for operating illegally in the state. A Polymarket trader made about $300,000 from correctly betting on Biden’s last-minute pardons, raising questions about access to inside information.

Scammers are bypassing banks’ security through hacking services sold on Telegram.

The EU unveiled an age verification app. Some EU regulators say they’ve been left out of conversations to get access to Claude Mythos.

Grok is still making sexual deepfakes despite X’s promises to stop, a review found. Apple privately threatened to remove Grok from the App Store in January, Apple told senators. But Apple and Google continue to offer nudify apps, a new report said. Nearly 90 schools and 600 students have been impacted by AI deepfake nudes, an analysis showed.

xAI is reportedly supplying coding startup Cursor with computing power. X’s crackdown on bots is also purging many secret porn feeds.

OpenAI released a major update to Codex, which can now operate a computer alongside a user. OpenAI released its version of Mythos, GPT-5.4-Cyber, to a select group. The company acquired personal finance startup Hiro Finance. OpenAI updated its Agents SDK.

Meta changed its rules to include the word “antifa” as a statement that it believes implies violence. The EU threatened an interim ban on policies that allegedly block AI rivals from operating on WhatsApp. Meta agreed to deploy 1 gigawatt of custom AI chips with Broadcom as part of a multi-gigawatt deal, as Broadcom CEO Hock Tan announced he’s leaving Meta’s board. Facebook and Instagram make up 70 percent of total social media ad revenues. Meta blamed its $100 price hike on the Quest 3 on the RAM shortage.

Google launched a native Gemini app for Mac. Chrome introduced Skills, an AI feature that lets users run repeatable AI prompts with a keyboard shortcut. Websites that prevent users from using the back button to leave a page will now be downranked on Search results. Google blocked a record 8.3 billion ads in 2025, but suspended fewer advertiser accounts. YouTube is now letting users turn off Shorts.

Apple is reportedly sending employees on its Siri team to a multi-week AI coding bootcamp.

Amazon said it will acquire satellite company Globalstar for $11.57 billion.

Spotify and three major labels won a copyright lawsuit against pirate library Anna’s Archive. Spotify launched its feature that allows users to buy physical books through the app.

Snap is laying off about 1,000 employees.

A look inside data labeling startup Mercor and its challenges with employee fraud and security blunders.

Anthropic appointed Vas Narasimhan, CEO of Swiss pharmaceutical company Novartis, to its board of directors in its second new appointment in months.

Anthropic researchers explored the ways LLMs can be used to improve alignment research. A look at the pros and cons of Gen Z, a generation who knows how to use AI, entering the workforce.

Teens use social media mainly for entertainment and connection, a new Pew survey showed.

Allbirds, the sneaker company, is turning into an AI compute provider.