Facebook scrapes some researchers

Banning NYU's Ad Observer team was probably inevitable — but it didn't have to be

Facebook scrapes some researchers

Programming note: Platformer is off next week. As always, you can see the posting schedule here.

More and more, I find myself wondering why we built a world in which so much civic discourse takes place inside a handful of giant digital shopping malls. Today has been one of those days.

So let’s talk about Facebook’s decision to disable the pages and personal accounts associated with the Ad Observatory project at New York University, which took data that had been volunteered by willing Facebook users and analyzed it in an effort to better understand the 2020 election and other subjects in the public interest.

In one corner, you have academic researchers working to understand the platform’s effects on our democracy. In the other, you have a company battered by nearly two decades of privacy scandals and regulatory fines, forever terrified that at a Cambridge Analytica sequel in lurking somewhere on the platform.


I first wrote about this case in October, when Facebook sent its initial cease-and-desist notice to the researchers. The issue concerned a browser extension created by an NYU team that, if installed, collects data about the ads you see on Facebook, including information about how those ads are targeted. Facebook already makes similar data publicly available through its online ad archive, but the NYU researchers say it’s incomplete and sometimes inaccurate — among other things, they say, many political ads are never labeled as such.

No one I have spoken to at Facebook believes that NYU’s work is not fundamentally in the public interest. Other mediums for political advertising don’t allow campaigns to target voters with nearly the level of precision that Facebook does, and the lingering belief that Facebook swung the 2016 election to Donald Trump drew heightened scrutiny to the company’s ad practices in 2020. It’s no wonder academics want to study the platform.

Anticipating this interest, the company established the Facebook Open Research and Transparency platform earlier this year. But like most of the company’s academic partnerships, FORT has been criticized for being too limited in the view of Facebook that it provides. In the case of the election, for example, it will only provide data from the 90 days before Election Day — despite the fact that the presidential campaign lasted well over a year. Moreover, researchers say, FORT requires researchers to access data on a laptop furnished by Facebook, preventing them from using their own machine-learning classifiers and other tools on the data available.

That’s why, when the NYU team received that cease-and-desist last fall, they said they planned to ignore it . “The only thing that would prompt us to stop doing this would be if Facebook would do it themselves, which we have called on them to do,” researcher Laura Edelson told the Wall Street Journal.

Facebook said it wouldn’t ban NYU until well after the election, and was true to its word. But on Tuesday night, the company dropped the hammer on the NYU team. “We took these actions to stop unauthorized scraping and protect people’s privacy in line with our privacy program under the FTC order,” said Mike Clark, a product management director, referring to Facebook’s consent decree with the Federal Trade Commission. 

Alex Abdo, an attorney for the NYU researchers, told me that he was taken aback by Facebook’s actions.

“On the one hand, it's not surprising — on the other hand, it's totally shocking that Facebook's response to research that the public really needs right now is to try to shut it down,” he said in an interview. “Privacy in research and social media is a genuinely hard question. But the answer can't be that Facebook unilaterally decides. And there is not an independent research project out there that is more respectful of user privacy than the Ad Observer.”


So let’s talk about privacy. The Ad Observer was designed to collect data about individual ads and the people they were targeted at, and also to anonymize that data. Mozilla, the nonprofit organization behind the Firefox browser, conducted a review of the extension’s code and its consent flow and ultimately recommended that people use it.

“We decided to recommend Ad Observer because our reviews assured us that it respects user privacy and supports transparency,” Marshall Erwin, the company’s chief security officer, said in a blog post. “It does not collect personal posts or information about your friends. And it does not compile a user profile on its servers.” 

You probably won’t be surprised to learn that Facebook sees it differently. Despite the lengths to which the researchers have gone here, the company told me, the Ad Observer still collects data that some users may object to. If an individual pays to boost a post, such as for a fundraiser, information including that user’s name and photo winds up in the NYU researchers’ hands. The Ad Observer may also collect similar information from comments on ads. And Facebook says information gleaned from an ad’s “why am I seeing this?” panel “can be used to identify other people who interacted with the ads and determine personal information about them.”

In any of these cases, the actual harm to the user would seem to be extremely minor, if you can call it a harm at all. But Facebook says it’s against their rules, and they have to enforce those rules, not least because Cambridge Analytica was a story about a researcher with seemingly good intentions who ultimately sold off the data he collected and created arguably the biggest scandal in company history.

It’s for that reason that I have at least some empathy for Facebook here. The company is continuously under fire for the way it collects and uses personal data, and here you have a case where the company is trying to limit that data collection, and many of the same critics who are still bringing up Cambridge Analytica on Twitter three years later are simultaneously arguing that Facebook has a moral obligation to let the Ad Observatory slide.

But letting things slide is not really in the spirit of the General Data Protection Regulation, California’s own privacy act, and any number of other privacy regulations. (As one smart person put it today in our Sidechannel server: “GDPR does not have a general research exemption.”)

Contrary to some earlier reporting, Facebook is not arguing that Ad Observer violates its FTC consent decree, it told me today. But the company has at least some good reasons to prevent large-scale data scraping like the kind represented by the NYU researchers. The rise of Clearview AI, a dystopian surveillance company that built a facial recognition in part by collecting publicly available photos on Facebook, has made that case in a visceral way this year.


While the fight between NYU and Facebook got ugly today, I think there are some obvious (though difficult) paths forward.

One is that Facebook could expand its current data export tools to allow us to contribute our data to projects like the Ad Observer voluntarily, but in an even more privacy-protective way. To hear Facebook tell it, if NYU’s browser extension collected just a handful fewer types of data, it might have been palatable to the company.

If you believe users have a right to discussing their personal experiences on Facebook, I think you should also agree they have a right to volunteer personal data that speaks to that experience. By Facebook’s nature, anyone’s personal experience is going to have a lot of other potentially non-consenting friends’ data wrapped up in it, too. But the company already lets me export my friends’ data — when they tag me in comments, send me Facebook messages, and so on. The company is already way closer to figuring out a way to let me share this information with researchers than it may seem.

Another option — rarely used in the United States — is that Congress could pass a law. It could write national privacy legislation, for example, and create a dedicated carveout for qualified academic researchers. It could require platforms to disclose more data in general, to academics and everyone else. It could establish a federal agency dedicated to the oversight of online communication platforms.

The alternative, as always, is to wait for platforms to regulate themselves — and to continuously be disappointed by the result.

The NYU-Facebook spat was always going to end up in the place we find it today: neither side had any good incentive to back down. But we all have reason to hope that researchers and tech companies come to better terms. Too much is at stake for the platforms to remain a black box forever.

“You would think they would be able to distinguish between the Cambridge Analyticas of the world, and the good-faith, privacy-respecting researchers of the world,” Abdo told me. “If they can't do that, then there really is no hope for independent research on Facebook's platform.”

If Facebook can’t — or won’t — make that distinction, Congress should make it for them.


Tech platforms may not be able to rely on sending users to third-party websites to opt out of having their data being sold, according to the California attorney general’s interpretation of the state’s privacy law. Here’s Kate Kaye at Digiday:

When a media and entertainment conglomerate directed people to a third-party trade association’s digital ad opt-out tool, it wasn’t an appropriate opt-out, said the state’s OAG, which said it amounted to a failure to allow people to opt-out from the sale of their personal information. And, when a pet industry site forced people to use a trade group’s digital ad opt-out tool, the OAG alleged the company was not in compliance with the law.

The Federal Trade Commission warned companies to “merge at your own risk” after a surge in acquisitions has prevented it from completing reviews in the standard 30-day window. The agency is reserving the right to investigate after that window closes, even if it hasn’t begun its review. (Issie Lapowsky / Protocol)

Amazon is resisting a vaccine mandate at its warehouses for fear of alienating skeptical employees in a tight labor market. “‘A lot of the associates do not want to be forced to get something,’ said a manager at an East Coast warehouse. If Amazon rolled out a vaccine mandate, ‘it wouldn’t go over well. They’d lose a lot of employees if they do that.’” (Matt Day and Spencer Soper / Bloomberg)

A look at the expanding number of labor initiatives surrounding Amazon. In addition to a new union election in Alabama, the company also faces a coordinated boycott and protest campaign from the Teamsters, and a law in California that would force the company to reveal its productivity quotas. (Noam Scheiber / New York Times)

Members of national militia groups are forming smaller local militias to continue recruiting on Facebook without being banned. They are also recommending that militia members use Telegram, where they are less likely to be banned. (Avani Yadav and Jared Holt / DFRLab)

A superspreader of vaccine misinformation said he would begin deleting all of his articles after 48 hours in an effort to evade enforcement. It’s always risky to announce your ban evasion strategies to platforms in advance! (Davey Alba / New York Times)

Social networks often moderate content in a way that disadvantages marginalized groups, according to new research from the NYU School of Law. The report calls on Congress “to establish a federal commission to investigate how to best facilitate the disclosure of platform data to enable independent research and auditing, protect privacy, and establish whistleblower protection for company employees and contractors who expose unlawful practices.” (Justin Hendrix / Tech Policy Press)

Tencent limited play time to its game Honour of Kings to one hour during daytime hours in an effort to placate the Chinese government. The Chinese gaming industry is bracing for a broader crackdown in light of recent comments in state media. (Josh Ye / South China Morning Post)


Amazon introduced new programs to let third-party sellers re-sell returned items as “used” following a British TV expose revealing that the company destroyed millions of returned items a year. Here’s Sam Shead at CNBC:

Under the program, returns are automatically routed to Amazon for evaluation. Once the product is received, Amazon decides if it is: “Used - Like New, Used - Very Good, Used - Good, or Used – Acceptable.” Sellers then set the price for the item based on its condition.

Amazon said the program has been launched in the U.K., but it will be expanded to the U.S. by the end of the year. FBA Grade and Resell will be rolled out in Germany, France, Italy and Spain by early 2022.

TikTok is testing Snapchat-style stories. Sure, why not. (Sarah Perez / TechCrunch)

Google fired dozens of employees for misusing user data between 2018 and 2020, according to a leaked document. A rare look at a persistent and disturbing issue; “Eighty-six percent of all security-related allegations against employees included mishandling of confidential information, such as the transfer of internal-only information to outside parties.” (Joseph Cox / Vice)

Google introduced Google Identity Services, a new one-tap login authentication module. Looks nice! (Abner Li / 9to5Google)

Instacart hired Carolyn Everson, the former head of global advertising at Facebook, as its president. It’s the latest in a string of departures from Facebook to Instacart that also saw Facebook app lead Fidji Simo become Instacart CEO last month. (Salvador Rodriguez / CNBC)

Tinder’s revenue grew 26 percent in the most recent quarter and had 9.6 million paying customers, up 17 percent. But Match Group still missed earnings expectations, largely due to slower growth globally. (Emily Bary / Marketwatch)

Related: Match said it would add audio and video chat, including group chat, to several of its products over the next two years. (Sarah Perez / TechCrunch)

Those good tweets

Talk to me

Send me tips, comments, questions, and scraped ads: casey@platformer.news.