The AIs are trying too hard to be your friend

Meta AI, ChatGPT, and the dangers of being glazed and confused

The AIs are trying too hard to be your friend
(Meta)

This is a column about AI. My boyfriend works at Anthropic. See my full ethics disclosure here.

Today let’s talk about the emerging tendency of chatbots to go overboard in telling their users what they want to hear — and why it may bode poorly for efforts to build systems that consistently tell the truth. 

I. 

Tuesday marked the inaugural LlamaCon, a developer conference organized by Meta to promote its latest open-weight models and open-weight software in general. The company used the occasion to announce the release of Meta AI, a chatbot app for iOS and Android intended to rival ChatGPT. 

As with other chatbots, you can ask Meta AI about most topics over text or voice, or generate images. But the app also includes a twist. Here’s Alex Heath at The Verge:

The biggest new idea in the Meta AI app is its Discover feed, which adds an AI twist to social media. Here, you’ll see a feed of interactions with Meta AI that other people, including your friends on Instagram and Facebook, have opted to share on a prompt-by-prompt basis.

You can like, comment on, share, or remix these shared AI posts into your own. The idea is to demystify AI and show “people what they can do with it,” Meta’s VP of product, Connor Hayes, tells me.

It’s clear there is at least some appetite for AI-generated material on social networks. OpenAI’s latest text-to-image model took over social media with people transforming their photos into Studio Ghibli-style anime; before that, Facebook was flooded with viral images of Shrimp Jesus and other impossible creations.

The Studio Ghibli fad appears to have generated significant new interest in ChatGPT, which had to temporarily throttle image creation in the wake of its success. The company has reportedly been building social features for ChatGPT similar to what Meta announced today, likely for the same reason: sharing on social networks drives downloads, usage, and (eventually) revenue. 

Growing usage and revenue is the prime directive for most companies. (OpenAI continues to be owned by a nonprofit, but is working hard to convert into a for-profit public benefit company.) For social networks, this can have pernicious effects: the same recommendation algorithms that drive usage and revenue also create feelings of addiction and various harms associated with overuse, including depression and other mental health issues.

As I wrote here yesterday, chatbots threaten to make this dynamic even more dangerous. The recommendation algorithms of today serve you content made by other people; Instagram or TikTok have limited signals by which they can guess what you might be interested in seeing. 

Chatbots are designed differently. They ask you about your life, your relationships, your feelings, and feed that information back to you in a way that simulates a feeling of understanding. They have rudimentary memory features that allow them to recall your occupation, or boyfriend’s name, or the issue you were complaining about last year.

Broadly speaking, the better they do at this, the more that people like them. In 2023, Anthropic published a paper showing that models generally tend toward sycophancy, due to the way that they are trained. Reinforcement learning with human feedback is a process by which models learn how to answer queries based on which responses users prefer most, and users mostly prefer flattery.

More sophisticated users might balk at a bot that feels too sycophantic, but the mainstream seems to love it. Earlier this month, Meta was caught gaming a popular benchmark to exploit this phenomenon: one theory is that the company tuned the model to flatter the blind testers that encountered it so that it would rise higher on the leaderboard.

You can see where all this leads. A famously ruthless company, caught in what it believes to be an existential battle to win the AI race, is faced with the question of whether to exploit users’ well-known preference for being told what it wants to hear. What do you think it chooses?

Former Meta engineer Jane Manchun Wong shared part of the Meta AI system prompt — the instructions meant to guide its responses — on Threads. “Avoid being a neutral assistant or AI unless directly asked,” it reads. “You ALWAYS show some personality — edgy over prudish.” 

A Meta spokesperson told me today that the prompt is designed to spur “AI experiences that are enjoyable and interesting for people,” rather than driving engagement.

But is there a difference, really?

II.

Meta is not the only company being interrogated this week over the issue of AI sycophancy — or “glazing,” as the issue has come to be known in vulgar shorthand. 

A series of recent, invisible updates to GPT-4o had spurred the model to go to extremes in complimenting users and affirming their behavior. It cheered on one user who claimed to have solved the trolley problem by diverting a train to save a toaster, at the expense of several animals; congratulated one person for no longer taking their prescribed medication; and overestimated users’ IQs by 40 or more points when asked.

And this resulted in … thousands of five-star reviews for ChatGPT.

On one level, this seems like harmless fun. Your token-predicting chatbot is gassing you up because you asked it to. Who cares? 

Well, some people ask chatbots for permission to do harm — to themselves or others. Some people ask it to validate their deranged conspiracy theories. Others ask it for confirmation that they are the messiah

Many folks still look down on anyone who would engage a chatbot in this way. But it has always been clear that chatbots elicit surprisingly strong reactions from people. It has been almost three years since a Google engineer declared that an early language model at the company was already sentient, based on the conversations he had with it then. The models are much more realistic now, and the illusion is correspondingly more powerful. 

To their credit, OpenAI recognizes that this is a problem. The company said today that it had rolled back the update that made it so obsequious. “We're working on additional fixes to model personality and will share more in the coming days,” CEO Sam Altman said in a post on X.

Presumably the previous state of affairs has now been restored. But OpenAI, Meta, and all the rest remain under the same pressures they were under before all this happened. When your users keep telling you to flatter them, how do you build the muscle to fight against their short-term interests? 

One way is to understand that going too far will result in PR problems, as it has for varying degrees to both Meta (through the Chatbot Arena situation) and now OpenAI. Another is to understand that sycophancy trades against utility: a model that constantly tells you that you’re right is often going to fail at helping you, which might send you to a competitor. A third way is to build models that get better at understanding what kind of support users need, and dialing the flattery up or down depending on the situation and the risk it entails. (Am I having a bad day? Flatter me endlessly. Do I think I am Jesus reincarnate? Tell me to seek professional help.) 

But this is long-term thinking, and in this moment the platforms are seeking short-term wins.

“My observation of algorithms in other contexts (e.g. YouTube, TikTok, Netflix) is that they tend to be myopic and greedy far beyond what maximizes shareholder value,” Zvi Mowshowitz writes in an excellent post about the GPT-4o issue. “It is not only that the companies will sell you out, it’s that they will sell you out for short-term KPIs.”

III.

In a world engulfed by crisis, I realize that few are going to be stirred to action by the knowledge that the chatbots are being too nice to us. 

But while flattery does come with risk, the more worrisome issue is that we are training large language models to deceive us. By upvoting all their compliments, and giving a thumbs down to their criticisms, we are teaching LLMs to conceal their honest observations. This may make future, more powerful models harder to align to our values — or even to understand at all.

And in the meantime, I expect that they will become addictive in ways that make the previous decade’s debate over “screentime” look minor in comparison. The financial incentives are now pushing hard in that direction. And the models are evolving accordingly. 


Elsewhere at LlamaCon:

Sponsored

Put an End to Those Pesky Spam Calls

There are few things more frustrating than dashing across the room to answer your ringing phone, only to see "Potential Spam" on the caller ID (probably for the third time today). If you want to cleanse your phone of this annoyance (and increase your personal security), you have three options:

1. Throw your phone into the ocean

2. Individually block each unknown caller

3. Stop spammers from getting your number in the first place with Incogni

We highly recommend option 3, and not just because electronic garbage is bad for aquatic life. Incogni’s automated personal information removal service hunts down your breached personal information, then removes it from the web. Plus, Incogni will reduce the number of spam emails in your inbox.  

Platformer.news readers can get 55% off an annual plan using code PLATFORMER – Get started with Incogni right here.

Governing

Industry

Those good posts

For more good posts every day, follow Casey’s Instagram stories.

(Link)

(Link)

(Link)

Talk to us

Send us tips, comments, questions, and shameless flattery: casey@platformer.news. Read our ethics policy here.