OpenAI

Inside OpenAI’s new plan for fighting deepfakes

The company has new strategies for shoring up our shared sense of reality. Will they matter in the election?

Casey Newton

May 7, 2024 — 9 min read

A "digital nutrition label" featuring metadata from the C2PA. (C2PA)

On Tuesday, OpenAI joined the steering committee of a little-known but increasingly powerful standards body known as the Coalition for Content Provenance and Authenticity, or C2PA. In doing so, the company waded into one of the most critical debates about democracy in the age of artificial intelligence: how will we know that what we are seeing online is real? And what should tech companies do when they spot a fake?

With just under six months to go in the US presidential campaign, the biggest social networks have mostly managed to stay out of election coverage. With the top match-up a repeat of 2020, the candidates and their respective positions are well known by many voters. Whatever happens in November — and unlike in the epoch-shifting election of 2016 — it is difficult to imagine a scenario in which platform policies play a decisive role in the outcome.

At the same time, that doesn’t mean platforms will have an easy time this year. Elections affecting half the world’s population are scheduled to take place, and after years of layoffs, platforms will be monitoring them with much smaller trust and safety teams than they once had. And there’s at least one complicating variable that companies like Meta, Google, and TikTok have never had to manage in an election before: generative artificial intelligence, which threatens to overwhelm their platforms with synthetic media that could spread misinformation, foreign influence campaigns, and other potentials harms.

At this point, anyone whose Facebook feed has been taken over by Shrimp Jesus — or simply perused the best AI fakes from this year’s Met Gala — knows how easily AI-generated words and images can quickly come to dominate a social platform. For the most part, the outlandish AI images that have attracted media coverage this year appear to be part of a standard spam playbook: quickly grow an audience via clickbait, and then aggressively promote low-quality goods in the feed for however long you can.

In the coming weeks and months, platforms are likely to be tested by a higher-stakes problem: people attempting to manipulate public opinion with synthetic media. Already we’ve seen deepfakes of President Biden and Donald Trump. (In both of those cases, incidentally, the perpetrators were connected to different rival campaigns.) Synthetic media targeted at races at the local, state, and national level is all but guaranteed.

In February, the big platforms signed an agreement at the Munich Security Conference committing to fight against deepfakes that attempt to deceive voters.

But with a flood of fakes expected to begin arriving any week now, what steps should Instagram, YouTube, and all the rest be taking?

There are two basic tasks here for the tech companies: figure out what’s fake, and then figure out what to do about it

The first, and arguably most difficult task for platforms is to identify synthetic media when they see it. For a decade now, platforms have worked to develop tools that let them sniff out what Meta calls “inauthentic behavior”: fake accounts, bots, coordinated efforts to boost individual pieces of content, and so on. It can be surprisingly hard to identify this stuff: bad actors’ tactics continually evolve as platforms get wise to them.

The C2PA was founded in 2021 by a group of organizations including Microsoft, Adobe, Intel, and the BBC. The goal was to develop a royalty-free, open technical standard that adds tamper-proof metadata to media files. If you create an image with Photoshop, for example, Adobe adds metadata to the image that identifies the date it was created, the software where it originated, and any edits made to it, among other information. From there, anyone can use a free tool like Content Credentials to see where it came from.

OpenAI said in January that it would begin adding this metadata to images created with its DALL-E 3 text-to-image tool and its Sora video generation tool. In joining the C2PA, the company is endorsing this metadata-driven approach to help tell what’s real from what’s fake.

“The world needs common ways of sharing information about how digital content was created,” the company said in a blog post. “Standards can help clarify how content was made and provide other information about its origins in a way that’s easy to recognize across many situations — whether that content is the raw output from a camera, or an artistic creation from a tool like DALL-E 3.”

Of course, there are some big flaws in any plan that relies on metadata. For one thing, most media uploaded to social platforms likely isn’t made with a tool that adds content credentials. For another, you can easily take a screenshot of an AI-generated image to eliminate its metadata.

To fully solve the problem, you need to try several different things at once.

OpenAI has a few different ideas on the subject. In addition to the C2PA, on Tuesday it also released a deepfake detector to a small group of researchers for testing. The tool can identify 98.8 percent of images made with DALL-E 3, the company said, but performs worse on images generated by rivals like Midjourney. In time, though, the company hopes it will serve as a check on media uploaded without content credentials — although they’re not sharing it with platforms like Facebook or Instagram just yet.

“We think it’s important to build our understanding of the tool much more before it’s let loose in the wild,” Sandhini Agarwal, who leads trustworthy AI efforts at the company, told me. Among other things, OpenAI is still working to understand how the classifier responds to different sets of images. For example, there are concerns that images from non-Western countries, which may be underrepresented in the tool’s underlying data set, might be falsely identified as fakes.

OpenAI is also exploring ways to add inaudible digital watermarks to AI-generated audio that would be difficult to remove. “Fingerprinting,” which allows companies to compare media found in the wild to an internal database of media known to be generated by their own tools, offers another possible path forward.

In the meantime, OpenAI and Microsoft said Tuesday that they would set aside $2 million for a digital literacy program designed to help educate people about AI-generated media.

“As we've been on a sort of listening tour to understand what the biggest concerns are leading up to elections in various countries, we’ve heard that there just aren’t high levels of AI literacy,” said Becky Waite, who leads OpenAI’s election preparedness efforts. “There’s a real concern about people's ability to distinguish what they're seeing online. So we're excited about this as an area of investment.”

No one I spoke with seemed certain that these approaches could collectively prevent the infocalypse. There are still technical breakthroughs to be achieved in order to create metadata that cannot be removed, for example. And it will take time for voters’ digital literacy to catch up to the state of the art. (Particularly since all these high-tech credentials remain mostly invisible to the average user.)

At the same time, you can begin to understand how platforms will soon use technology to build trust in what people are seeing in their feeds. A combination of metadata, watermarks, fingerprinting and deepfake detection should go a long way in determining the provenance of much of the content on social networks. That could be particularly true if platforms adopt policies that encourage people to share media that comes with those credentials.

What would those policies look like? One possibility is that platforms could create a system for digital content that resembles passport control. Just as most people are not allowed to cross borders without passports, someday social networks could choose not to display posts that arrive without credentials. Or, more likely, they could restrict this type of content’s reach: showing it to an account’s followers, for example, but making it ineligible for promotion in ranked feeds.

“There are tricky trade-offs to be made here,” said David Robinson, who leads OpenAI’s policy planning team. “I'm sure different platforms will make different choices.”

We’re already seeing hints of this: in March, YouTube added a way for creators to label their own videos as AI-generated; Meta followed suit with a similar approach last month. For now, platforms aren’t penalizing users for uploading AI-generated content, which is fine: most of it is relatively benign. As more pernicious forms of AI content begin to materialize on platforms, though, they may begin to consider more restrictive approaches.

“It’s not just a technical problem,” Agarwal told me. “I think that’s the crux of why it’s so complex. Because it’s not just about having the technical tools. It’s about having resilience across the stack. We’re one actor, but there are many others. This is a collective action problem.”

Awful news--there were 2 volumes of the collected work of Hegel in the library and the protestors broke in and left 2 more
— cantorsdust (@cantorsdust.bsky.social) May 4, 2024 at 10:44 AM