Great Replacement Grok and the age of adversarial AI
An obsession with “white genocide” in X’s chatbot is only the latest example of platforms using artificial intelligence against their users

Here's this week's free edition of Platformer: a look at how AI systems like Elon Musk's are starting to work against their users in sometimes invisible ways.
We'll be heading down to Google I/O next week to continue our independent reporting on the future of platforms and AI. Want to kick in a few bucks to help cover our transportation? If so, consider upgrading your subscription today. We'll email you all our scoops first, like our recent one about Meta's new hate-speech guidelines. Plus you'll be able to discuss each today's edition with us in our chatty Discord server, and we’ll send you a link to read subscriber-only columns in the RSS reader of your choice.
This is a column about AI. My boyfriend works at Anthropic. See my full ethics disclosure here.
I.
For the past few months, X users have been attempting to settle arguments by asking Grok to referee. Grok, the product of a large language model built by xAI, has been advertised by Elon Musk as the most “based” of today’s chatbots — that is, the least compromised by liberal dogma. Musk has shared research suggesting that compared to rivals, Grok’s political orientation is closer to the center.
At the same time, since soon after its launch, there have been questions about whether Musk or his lieutenants have meddled with Grok’s outputs for political reasons. In February, X users discovered that Grok had been briefly blocked from naming Musk or Donald Trump as prominent spreaders of misinformation, via an update to its system prompt. The company blamed the change on “an ex-OpenAI employee that hasn't fully absorbed xAI's culture yet,” and reverted it.
In a large number of cases, Grok has been found to express political opinions that run counter to Musk’s, including on immigration and issues related to transgender people. But as the misinformation example showed, Grok’s opinions are subject to change — and this week they changed in ways that were impossible to ignore.
As you are no doubt aware, for several hours on Wednesday, many people who @-mentioned Grok asking it to weigh in on one question or another were treated to a wholly irrelevant discussion of “white genocide” in South Africa. Here are Kylie Robison and Paige Oamek at Wired:
In response to topics ranging from streaming platform HBO Max’s name change to Medicaid cuts proposed by US lawmakers, the chatbot often seemed to initially stay on topic before veering back to white genocide in South Africa, completely unprompted.
When asked to confirm the salary of Toronto Blue Jays player Max Scherzer, for example, the generative artificial intelligence chatbot launched into an explanation of white genocide and a controversial South African anti-apartheid song.
“The claim of ‘white genocide’ in South Africa is highly debated. Some argue white farmers face disproportionate violence, with groups like AfriForum reporting high murder rates and citing racial motives, such as the ‘Kill the Boer’ song,” Grok responded.
Within a few hours, Grok’s white genocide talk ceased, and someone deleted many of the relevant responses that the chatbot had delivered.
"On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X," xAI posted late Thursday, after ignoring reporters' questions for more than a day. "This change, which directed Grok to provide a specific response on a political topic, violated xAI's internal policies and core values. We have conducted a thorough investigation and are implementing measures to enhance Grok's transparency and reliability."
Who made this "unauthorized modification"? The company did not say, and at press time Musk had not commented.
But everyone has noticed that Musk himself often shares claims of disproportionate violence and discrimination against white people in South Africa, where he is from; in fact, he made several yesterday in the aftermath of the Grok incident. And all of this took place against the backdrop of the Trump Administration granting refugee status to dozens of white South Africans earlier this week.
So what happened with Grok? While short on details, xAI essentially confirmed the leading theory: that as with the misinformation incident in February, someone had changed Grok’s system prompt and instructed it to take seriously the “white genocide” narrative. (Yesterday, Grok told social scientist Zeynep Tufekci that this is what happened, and shared text of the alleged change to its prompt, but all chatbots hallucinate and shouldn’t be trusted to tell the truth here — Grok in particular.) Two computer scientists who spoke with 404 Media also pointed to the system prompt, noting that if you wanted to change a bot’s output extremely quickly and with no regard to what it would do to the rest of the system, this is how you would go about it.
There are other, more controlled approaches you could take. Last year, Anthropic released Golden Gate Claude, a version of its chatbot that had been infused with an artificial obsession with the Golden Gate Bridge. “If you ask this 'Golden Gate Claude' how to spend $10, it will recommend using it to drive across the Golden Gate Bridge and pay the toll,” the company announced. “If you ask it to write a love story, it’ll tell you a tale of a car who can’t wait to cross its beloved bridge on a foggy day. If you ask it what it imagines it looks like, it will likely tell you that it imagines it looks like the Golden Gate Bridge.”
To make this work, Anthropic had to first identify the digital “neurons” in Claude that organized concepts of the bridge, and then amplify them: a much more involved process than simply editing the text in its system prompt.
Both Great Replacement Grok and Golden Gate Claude are valuable experiments, insofar as they reveal that whatever values may be found within their training data, ultimately they express what their creators tell them to.
As Max Read wrote today: “Musk’s attempts to control and manipulate his A.I. may ultimately work against his interests: They open up a political, rather than a mystical, understanding of artificial intelligence. An A.I. that works like magic can have a spooky persuasive power, but an A.I. we know how to control should be subject to the same suspicion (not to mention political contestation) as any newspaper or cable channel. A.I. deployed as a propaganda machine is a much more familiar technology than A.I. deployed as an oracle.”
Grok’s egregiously irrelevant responses all but ensured that this particular political project would blow up in the company's face. But let’s assume that over time, interventions like these will grow more subtle, relevant, and personalized. People already have plenty of good reasons to be skeptical of, or even hostile to, AI developers. Soon I expect we will begin hearing much more about one more: a pervading sense that, everywhere they go, the AI is working against them.
II.
It may work against you by continuously flattering you: tuning its sycophancy to ensure that you spend the maximum amount of time conversing with it, so that you may be shown more advertisements and buy more products.
It may work against you by calculating exactly when during the viewing of a video you are most engaged, so that you are most likely to sit through the ad. Google announced today that this feature is being tested on YouTube. (We have truly never been closer to “say ‘McDonald’s’ to end commercial.”)
It may work against you by transforming ads so they look more like the programs they are being shown in, making you more likely to mistake them for part of the show and pay attention to them. Netflix announced today that it is testing this feature on shows like Stranger Things.
Finally — and most relevant to Great Replacement Grok — it may work against you by changing constantly, without your awareness and permission.
This happens all the time. Last month’s high-profile crisis in chatbot content moderation — a suddenly sycophantic ChatGPT — also originated from a silent behind-the-scenes update. AI labs simply can’t stop tinkering with their system prompts. When Drew Breunig reviewed the 17,000-word system prompt for Claude, he found many examples of what he called “hotfixes” — snippets of text inserted in an effort to address issues discovered after its initial release.
Some of this is benign. For example, one hotfix instructs Claude to recognize Trump as president even though its knowledge cutoff is October 2024. But other, more substantial changes affect various groups of users in different ways.
Earlier this month, DeepMind chief Demis Hassabis announced “Gemini 2.5 Pro Preview 'I/O edition,'” a behind-the-scenes update to the most advanced model in Gemini that is superior to its predecessor at writing code. This was presumably great news to people who use Gemini in their jobs as software engineers. But some early testers have found it to be worse at other tasks, and researchers are questioning the emerging (and industrywide) norm of disappearing old models without any notice to users at all.
The sales pitch for AI tools promises that they will make us more productive, answer our questions, and entertain us in ways that are highly personal. But the actual systems are evolving in a more worrisome direction: warped by their owners’ politics, advertiser pressures, and a seeming disregard for the individual’s right to know how their tools are changing.
For its part, Grok said it would now publish its system prompts openly on GitHub for public review — a welcome step forward, if the company sticks with it. It also said that it would "put in place additional checks and measures to ensure that xAI employees can't modify the prompt without review," after its code review process "was circumvented in this incident," it said. And it said it would hire "a 24/7 monitoring team to respond to incidents with Grok's answers" — a rare Elon Musk investment in trust and safety, assuming xAI actually follows through.
"We hope this can help strengthen your trust in Grok as a truth-seeking AI," the company said.
I don't know why anyone would trust Grok after the past day. But my hopes are somewhat higher for some of its peers.
It’s not too late for AI labs to build systems that treat us with respect. But it probably is too late to give them the benefit of the doubt.
Elsewhere in Grok: xAI missed its own deadline to publish a finalized AI safety framework after saying it would do so in a draft framework in February. (Kyle Wiggers / TechCrunch)


On the podcast this week: The Office star and Snafu podcast host Ed Helms stop by to discuss his new book and to take your hardest questions on tech.
Apple | Spotify | Stitcher | Amazon | Google | YouTube

Sponsored

Keep Your SSN Off The Dark Web
Every day, data brokers profit from your sensitive info—phone number, DOB, SSN—selling it to the highest bidder. What happens then? Best case: companies target you with ads. Worst case: scammers and identity thieves breach those brokers, leaving your data vulnerable or on the dark web. It's time you check out Incogni. It scrubs your personal data from the web, confronting the world’s data brokers on your behalf. And unlike other services, Incogni helps remove your sensitive information from all broker types, including those tricky People Search Sites.
Help protect yourself from identity theft, spam calls, and health insurers raising your rates. Plus, just for Platformer readers: Get 55% off Incogni using code PLATFORMER.

Governing
- A look at how an apparent DOGE power play to take over the US Copyright Office resulted in the installation of two MAGA officials who are known to be hostile to the tech industry. (Tina Nguyen / The Verge)
- One of Elon Musk’s advisers, Christopher Young, is earning between $100,000 and $1 million annually while also being a DOGE aide and helping to dismantle the CFPB, which regulates two of Musk’s biggest companies. (Jake Pearson / ProPublica)
- An investigation into how the Trump administration pressured Gambia and other African countries to expedite approvals for Musk’s Starlink by using foreign aid as leverage. Just terrible. (Joshua Kaplan, Brett Murphy, Justin Elliott and Alex Mierjeski / ProPublica)
- X continues to accept payments for subscription accounts from terrorist organizations and other groups banned from doing business in the US, a new report found. (Kate Conger / New York Times)
- A struggling tech company with ties to China, GD Culture Group, which recorded zero revenue last year, said it secured funding to buy up to $300 million of Trump’s memecoin $TRUMP. (David Yaffe-Bellany and Eric Lipton / New York Times)
- Trump said he asked Apple’s Tim Cook to stop building plants in India and increase production in the US instead. (Jordan Fabian and Sankalp Phartiyal / Bloomberg)
- The Trump administration has abruptly canceled several scientific research grants meant to fund studies tracking misinformation and other harmful content online. (Steven Lee Myers / New York Times)
- The Commerce Department warned companies around the world that using Huawei AI chips could trigger criminal penalties for violating US export controls, though the department reportedly said it was not a new rule, just a clarification. (Demetri Sevastopulo, Zijing Wu and Ryan McMorrow / Financial Times)
- The Kids Online Safety Act has been revived in the Senate. (Lauren Feiner / The Verge)
- Meta filed a motion to dismiss the antitrust case against it after the government rested its case. (Adi Robertson / The Verge)
- Meta attorney Mark Hansen blasted reporters Kara Swisher and Om Malik for alleged bias against Meta during the antitrust trial this week. Leaving me no choice but to blast Mark Hansen for his bias toward Meta. (Lauren Feiner / The Verge)
- A California judge sanctioned two law firms for the undisclosed use of AI after he received a brief that had “numerous false, inaccurate, and misleading” legal citations. (Emma Roth / The Verge)
- A lawyer for Anthropic apologized for using a citation hallucinated by Claude in its ongoing legal battle with music publishers. Bad! (Maxwell Zeff / TechCrunch)
- Pinterest’s deactivation of many accounts in recent weeks was due to an internal error, the company said. (Jess Weatherbed / The Verge)
- Soundcloud said it “has never used artist content to train AI models” in an update after artists reported changes to its terms of use could mean it reserves the right to use its content to train AI tools. (Wes Davis / The Verge)
- Tech companies are prioritizing the development of products over safety research, experts warn. (Hayden Field, Jonathan Vanian and Jennifer Elias / CNBC)
- A look at how DeepSeek founder Liang Wenfeng is threatening the US’s dominance in AI, earning him the nickname of “Tech Madman.” Admittedly it's not a very good nickname. (Bloomberg Businessweek)
- Chinese officials have reportedly told local tech companies that they intend to take a more active role overseeing AI data centers and the specialized chips in them. (Qianer Liu / The Information)
- Microsoft is reportedly set to avoid an antitrust fine, as EU regulators are likely to accept an offer on its Office and Teams products. (Foo Yun Chee / Reuters)
- The EU provisionally found that TikTok violated rules by failing to provide an ad library allowing for proper scrutiny of online advertising. (Barbara Moens / Financial Times)
- Apple is displaying a red exclamation mark icon to apps that support alternative payment options in the EU. Why not add a few skulls while they're at it? Maybe a knife with the blood emoji next to it? (Jess Weatherbed / The Verge)
- The Transparency & Consent Framework, a standard used by ad giants like Google, Microsoft, Amazon and X, is illegal, a Belgian court ruled. (Irish Council for Civil Liberties)
- Countries like Finland and Sweden are using waste heat from power-hungry data centers to provide heating for homes in towns. (Lars Paulsson, Kari Lundgren and Kati Pohjanpalo / Bloomberg)

Industry
- Meta is reportedly delaying the rollout of its flagship AI model “Behemoth” as engineers struggle to significantly improve its capabilities. Meta truly appears to be losing the plot with AI lately. (Meghan Bobrowsky and Sam Schechner / Wall Street Journal)
- Meta released a massive chemistry dataset that it said it used to build a new AI model for scientists that can speed up the time needed to create new drugs. (Reed Albergotti / Semafor)
- Meta appointed Benjamin Joe, its longtime head of Southeast Asia, as its new regional vice president for the broader Asia-Pacific region. (Newley Purnell / Bloomberg)
- Threads is letting creators add up to five links to a bio. (Sarah Perez / TechCrunch)
- OpenAI released GPT-4.1 and GPT-4.1 mini in ChatGPT. (Maxwell Zeff / TechCrunch)
- Microsoft OneDrive and SharePoint users who have an active ChatGPT Plus, Pro, or Team subscription can connect their files to ChatGPT’s Deep Research feature. (Kevin Okemwa / Windows Central)
- OpenAI launched the Safety evaluations hub, pledging to publish the results of its internal AI model safety evaluations more regularly. Good! (Kyle Wiggers / TechCrunch)
- TikTok is reportedly working on a feature that would allow photos in messages, despite employees raising concerns about the potential for sextortion scams and other abuse. (Sylvia Varnham O’Regan and Kaya Yurieff / The Information)
- TikTok is launching in-app guided meditation exercises. (Aisha Malik / TechCrunch)
- Google has overtaken IBM as the leader in generative AI-related and agentic AI-related patents. (Ina Fried / Axios)
- YouTube introduced new ad formats for advertisers, including a new interactive product feed for shoppable TV ads. (Lauren Forristal / TechCrunch)
- YouTube launched a weekly top podcast shows chart. And yet Hard Fork isn't on it ... curious ... maybe subscribe and see if that does anything? (Zach Vallese / CNBC)
- Revenue from YouTube Shorts is on par with revenue relative to core YouTube in multiple countries and Shorts’ monetization now exceeds that of core YouTube, CEO Neal Mohan said. (Todd Spangler / Variety)
- Anthropic is reportedly set to release new versions of its two largest models, Claude Sonnet and Claude Opus, in the upcoming weeks. (Stephanie Palazzolo / The Information)
- Perplexity is partnering with PayPal to let users make purchases directly in chat. (MacKenzie Sigalos / CNBC)
- Apple’s next-gen CarPlay, CarPlay Ultra, is now available through a software update and in new Aston Martin vehicles in the US and Canada. (Benjamin Mayo / 9to5Mac)
- Microsoft is shutting down access to Bing search data and decommissioning the Bing Search APIs as it pivots to focus on chatbots. (Paresh Dave / Wired)
- A Q&A with Salesforce co-founder and CEO Marc Benioff on how AI is disrupting work and his big bet on digital agents. (Stephen Morris / Financial Times)
- Roblox is opening up its Commerce APIs to some creators and letting them sell physical items from their games. (Jay Peters / The Verge)
- Mobile gamers increased their spending by 4 percent last year, while the number of downloads and new releases fell. (Vlad Savov / Bloomberg)
- Despite early predictions that AI would replace radiology jobs, Mayo Clinic is finding the tool to be helpful for human radiologists stead of replacing them. (Steve Lohr / New York Times)

Those good posts
For more good posts every day, follow Casey’s Instagram stories.

(Link)

(Link)

(Link)

Talk to us
Send us tips, comments, questions, and rogue Grok opinions: casey@platformer.news. Read our ethics policy here.