How comfortable would you feel sending Facebook a scan of your passport? As a privacy-conscious, infrequent user of the platform, it caused me some consternation.
But in order to gain full access to the data in Facebook’s Ad Library — a database of political advertisements that have appeared on the social network — one must complete the same identity verification process as those wishing to place political ads. The lure of such a resource is irresistible to data journalists; consequently, I found myself providing Facebook with more personal information than I would previously have countenanced.
Political ads on Facebook are the cause of much concern and intrigue, having been strategically deployed, most notably by Russia, to influence elections around the world. The social network launched an “Ads Archive” last year after its slow-motion disclosure that at least 150m Americans saw politically divisive ads traced to a now-infamous Kremlin-adjacent “troll farm” during the 2016 presidential campaign.
The original archive lacked most of the features necessary for the large-scale analysis of ads data, including the ability to download data. Researchers were hopeful, though, when they saw a tantalising nugget buried in the archive announcement: “We’re working closely with . . . stakeholders to launch an API for the archive.”
Programmatic access enables researchers to perform more precise searches of the Ad Library and retrieve data in bulk, without requiring Facebook to provide such functionality via a website.
A small group of researchers were invited to test a beta version of the API last August, while the only solutions available to others seeking access to bulk ads data were databases operated by third parties. The databases are compiled using web browser extensions that capture and upload ads data as a user browses Facebook. One such database is operated by advocacy group Who Targets Me, which campaigns for transparency in political advertising.
With a user base of some 23,000, WTM amassed a considerable amount of data before realising something had changed.
“The first thing we noticed, around November, was that [Facebook] started hiring for all these anti-scraping positions”, said WTM co-founder Louis Knight-Webb. “Scraping” is the automated, usually bulk retrieval of data from websites not specifically intended to provide data by such means.
Mr Knight-Webb said that by January of this year, a single change to the underlying code of Facebook ads meant that the WTM browser extension could no longer automatically capture data on the groups at which ads were being targeted — though that information was still available by manually clicking a link on each ad.
Facebook’s director of product, Rob Leathern, said on Twitter on January 28 that the change was aimed at “preventing people’s data from being misused”.
But Mr Knight-Webb was unconvinced. “[WTM’s] value beyond the Ad Archive was that we gave you that data,” he said. “When we lost that, researchers and journalists lost the ability to see why certain interests or demographics are being targeted.”
In February, the Mozilla Foundation published an open letter to Facebook criticising its crackdown on scraping and calling on it to fulfil its commitments to increase transparency around political advertising. WTM was among the 32 campaign groups to co-sign the letter.
Within hours, Mr Leathern again took to Twitter to respond that the ads API would be made publicly accessible “in late March”.
An email inviting me to begin the process of gaining API access arrived on schedule. Having provided my passport and home address, received a verification code through the post and activated two-factor authentication on my Facebook account, I was informed that my identity had been confirmed. Some 130 lines of Python code later, I was able to sit back and watch the text of thousands of ads stream down my laptop screen.
Through the API, I am now able to retrieve data on millions of ads reaching countries including the US, India and the 28 EU member states. In the case of the UK, this amounts to 51,000 ads since October 2018 alone, though this falls some way short of the 82,000 Facebook reports to be in the archive. A spokesperson confirmed the company is working to resolve a bug impacting retrieval of the full database.
Regardless, and thanks in part to the work of campaigners like WTM, researchers finally have programmatic access to a wealth of ads data from the primary source. It strikes me, though, that something crucial has been lost: neither the Ad Library web interface nor the API provides targeting data, only high-level breakdowns of ad viewers by age group, gender and region. The data captured by browser extensions like WTM’s were vastly more granular, revealing the highly specific subgroups upon which microtargeting depends.
Which is not to say this may not materialise. A Facebook spokesperson said: “We’re taking feedback, and learning and improving our tools to make them more useful. We recently kicked off a formal feedback process with organisations using the API so we can prioritise future features.”
Mr Knight-Webb said he is optimistic, though he cautioned that “it is much easier for [Facebook] to defend the platform now, even though there is a lot of work to be done.”
“I’m concerned that we could have the perception of transparency rather than actual transparency,” he said.
David Blood is a data journalist at the Financial Times
What data are available from the Facebook Ad Library API?
Whether the ad is currently active or not
The countries reached by the ad
The ad category (eg “political and issue” or “news”)
The date and time that the ad was created
The text displayed in the ad (the “creative body”)
Details of any link contained in the ad
The date and time that delivery of the ad began and stopped
A link to an image of the ad as it would have appeared on Facebook
The currency used to pay for the ad
A demographic breakdown of people who viewed the ad by age, gender and region
The name of the person, company or other organisation that “provided funding” for the ad “as submitted by the purchaser of the ad”
The number of views, or “impressions” received by the ad and the amount spent on the ad expressed as a range, eg 0-999 impressions and £100-£499 spent
The name and unique ID of the Facebook page associated with the ad