Great Replacement Grok and the age of adversarial AI

An obsession with “white genocide” in X’s chatbot is only the latest example of platforms using artificial intelligence against their users

Great Replacement Grok and the age of adversarial AI
(Mariia Shalabaieva / Unsplash)

Here's this week's free edition of Platformer: a look at how AI systems like Elon Musk's are starting to work against their users in sometimes invisible ways.

We'll be heading down to Google I/O next week to continue our independent reporting on the future of platforms and AI. Want to kick in a few bucks to help cover our transportation? If so, consider upgrading your subscription today. We'll email you all our scoops first, like our recent one about Meta's new hate-speech guidelines. Plus you'll be able to discuss each today's edition with us in our chatty Discord server, and we’ll send you a link to read subscriber-only columns in the RSS reader of your choice.

This is a column about AI. My boyfriend works at Anthropic. See my full ethics disclosure here.

I.

For the past few months, X users have been attempting to settle arguments by asking Grok to referee. Grok, the product of a large language model built by xAI, has been advertised by Elon Musk as the most “based” of today’s chatbots — that is, the least compromised by liberal dogma. Musk has shared research suggesting that compared to rivals, Grok’s political orientation is closer to the center.

At the same time, since soon after its launch, there have been questions about whether Musk or his lieutenants have meddled with Grok’s outputs for political reasons. In February, X users discovered that Grok had been briefly blocked from naming Musk or Donald Trump as prominent spreaders of misinformation, via an update to its system prompt. The company blamed the change on “an ex-OpenAI employee that hasn't fully absorbed xAI's culture yet,” and reverted it.

In a large number of cases, Grok has been found to express political opinions that run counter to Musk’s, including on immigration and issues related to transgender people. But as the misinformation example showed, Grok’s opinions are subject to change — and this week they changed in ways that were impossible to ignore.

As you are no doubt aware, for several hours on Wednesday, many people who @-mentioned Grok asking it to weigh in on one question or another were treated to a wholly irrelevant discussion of “white genocide” in South Africa. Here are Kylie Robison and Paige Oamek at Wired:

In response to topics ranging from streaming platform HBO Max’s name change to Medicaid cuts proposed by US lawmakers, the chatbot often seemed to initially stay on topic before veering back to white genocide in South Africa, completely unprompted.

When asked to confirm the salary of Toronto Blue Jays player Max Scherzer, for example, the generative artificial intelligence chatbot launched into an explanation of white genocide and a controversial South African anti-apartheid song.

“The claim of ‘white genocide’ in South Africa is highly debated. Some argue white farmers face disproportionate violence, with groups like AfriForum reporting high murder rates and citing racial motives, such as the ‘Kill the Boer’ song,” Grok responded.

Within a few hours, Grok’s white genocide talk ceased, and someone deleted many of the relevant responses that the chatbot had delivered.

"On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X," xAI posted late Thursday, after ignoring reporters' questions for more than a day. "This change, which directed Grok to provide a specific response on a political topic, violated xAI's internal policies and core values. We have conducted a thorough investigation and are implementing measures to enhance Grok's transparency and reliability."

Who made this "unauthorized modification"? The company did not say, and at press time Musk had not commented.

But everyone has noticed that Musk himself often shares claims of disproportionate violence and discrimination against white people in South Africa, where he is from; in fact, he made several yesterday in the aftermath of the Grok incident. And all of this took place against the backdrop of the Trump Administration granting refugee status to dozens of white South Africans earlier this week.

So what happened with Grok? While short on details, xAI essentially confirmed the leading theory: that as with the misinformation incident in February, someone had changed Grok’s system prompt and instructed it to take seriously the “white genocide” narrative. (Yesterday, Grok told social scientist Zeynep Tufekci that this is what happened, and shared text of the alleged change to its prompt, but all chatbots hallucinate and shouldn’t be trusted to tell the truth here — Grok in particular.) Two computer scientists who spoke with 404 Media also pointed to the system prompt, noting that if you wanted to change a bot’s output extremely quickly and with no regard to what it would do to the rest of the system, this is how you would go about it. 

There are other, more controlled approaches you could take. Last year, Anthropic released Golden Gate Claude, a version of its chatbot that had been infused with an artificial obsession with the Golden Gate Bridge. “If you ask this 'Golden Gate Claude' how to spend $10, it will recommend using it to drive across the Golden Gate Bridge and pay the toll,” the company announced. “If you ask it to write a love story, it’ll tell you a tale of a car who can’t wait to cross its beloved bridge on a foggy day. If you ask it what it imagines it looks like, it will likely tell you that it imagines it looks like the Golden Gate Bridge.”

To make this work, Anthropic had to first identify the digital “neurons” in Claude that organized concepts of the bridge, and then amplify them: a much more involved process than simply editing the text in its system prompt. 

Both Great Replacement Grok and Golden Gate Claude are valuable experiments, insofar as they reveal that whatever values may be found within their training data, ultimately they express what their creators tell them to.

As Max Read wrote today: “Musk’s attempts to control and manipulate his A.I. may ultimately work against his interests: They open up a political, rather than a mystical, understanding of artificial intelligence. An A.I. that works like magic can have a spooky persuasive power, but an A.I. we know how to control should be subject to the same suspicion (not to mention political contestation) as any newspaper or cable channel. A.I. deployed as a propaganda machine is a much more familiar technology than A.I. deployed as an oracle.”

Grok’s egregiously irrelevant responses all but ensured that this particular political project would blow up in the company's face. But let’s assume that over time, interventions like these will grow more subtle, relevant, and personalized. People already have plenty of good reasons to be skeptical of, or even hostile to, AI developers. Soon I expect we will begin hearing much more about one more: a pervading sense that, everywhere they go, the AI is working against them. 

II.

It may work against you by continuously flattering you: tuning its sycophancy to ensure that you spend the maximum amount of time conversing with it, so that you may be shown more advertisements and buy more products. 

It may work against you by calculating exactly when during the viewing of a video you are most engaged, so that you are most likely to sit through the ad. Google announced today that this feature is being tested on YouTube. (We have truly never been closer to “say ‘McDonald’s’ to end commercial.”)

It may work against you by transforming ads so they look more like the programs they are being shown in, making you more likely to mistake them for part of the show and pay attention to them. Netflix announced today that it is testing this feature on shows like Stranger Things.

Finally — and most relevant to Great Replacement Grok — it may work against you by changing constantly, without your awareness and permission.

This happens all the time. Last month’s high-profile crisis in chatbot content moderation — a suddenly sycophantic ChatGPT — also originated from a silent behind-the-scenes update. AI labs simply can’t stop tinkering with their system prompts. When Drew Breunig reviewed the 17,000-word system prompt for Claude, he found many examples of what he called “hotfixes” — snippets of text inserted in an effort to address issues discovered after its initial release. 

Some of this is benign. For example, one hotfix instructs Claude to recognize Trump as president even though its knowledge cutoff is October 2024. But other, more substantial changes affect various groups of users in different ways.

Earlier this month, DeepMind chief Demis Hassabis announced “Gemini 2.5 Pro Preview 'I/O edition,'” a behind-the-scenes update to the most advanced model in Gemini that is superior to its predecessor at writing code. This was presumably great news to people who use Gemini in their jobs as software engineers. But some early testers have found it to be worse at other tasks, and researchers are questioning the emerging (and industrywide) norm of disappearing old models without any notice to users at all.

The sales pitch for AI tools promises that they will make us more productive, answer our questions, and entertain us in ways that are highly personal. But the actual systems are evolving in a more worrisome direction: warped by their owners’ politics, advertiser pressures, and a seeming disregard for the individual’s right to know how their tools are changing. 

For its part, Grok said it would now publish its system prompts openly on GitHub for public review — a welcome step forward, if the company sticks with it. It also said that it would "put in place additional checks and measures to ensure that xAI employees can't modify the prompt without review," after its code review process "was circumvented in this incident," it said. And it said it would hire "a 24/7 monitoring team to respond to incidents with Grok's answers" — a rare Elon Musk investment in trust and safety, assuming xAI actually follows through.

"We hope this can help strengthen your trust in Grok as a truth-seeking AI," the company said.

I don't know why anyone would trust Grok after the past day. But my hopes are somewhat higher for some of its peers.

It’s not too late for AI labs to build systems that treat us with respect. But it probably is too late to give them the benefit of the doubt. 


Elsewhere in Grok: xAI missed its own deadline to publish a finalized AI safety framework after saying it would do so in a draft framework in February. (Kyle Wiggers / TechCrunch)

On the podcast this week: The Office star and Snafu podcast host Ed Helms stop by to discuss his new book and to take your hardest questions on tech.

Apple | Spotify | Stitcher | Amazon | Google | YouTube

Sponsored

Keep Your SSN Off The Dark Web

Every day, data brokers profit from your sensitive info—phone number, DOB, SSN—selling it to the highest bidder. What happens then? Best case: companies target you with ads. Worst case: scammers and identity thieves breach those brokers, leaving your data vulnerable or on the dark web. It's time you check out Incogni. It scrubs your personal data from the web, confronting the world’s data brokers on your behalf. And unlike other services, Incogni helps remove your sensitive information from all broker types, including those tricky People Search Sites.

Help protect yourself from identity theft, spam calls, and health insurers raising your rates. Plus, just for Platformer readers: Get 55% off Incogni using code PLATFORMER.

Governing

Industry

Those good posts

For more good posts every day, follow Casey’s Instagram stories.

(Link)

(Link)

(Link)

Talk to us

Send us tips, comments, questions, and rogue Grok opinions: casey@platformer.news. Read our ethics policy here.