Last week, Google announced a plan to “build a more private web.” The announcement post was, frankly, a mess. The company that tracks user behavior on over ⅔ of the web said that “Privacy is paramount to us, in everything we do.” Google not only doubled down on its commitment to targeted advertising, but also made the laughable claim that blocking third-party cookies -- by far the most common tracking technology on the Web, and Google’s tracking method of choice -- will hurt user privacy. By taking away the tools that make tracking easy, it contended, developers like Apple and Mozilla will force trackers to resort to “opaque techniques” like fingerprinting. Of course, lost in that argument is the fact that the makers of Safari and Firefox have shown serious commitments to shutting down fingerprinting, and both browsers have made real progress in that direction. Furthermore, a key part of the Privacy Sandbox proposals is Chrome’s own (belated) plan to stop fingerprinting.
But hidden behind the false equivalencies and privacy gaslighting are a set of real technical proposals. Some are genuinely good ideas. Others could be unmitigated privacy disasters. This post will look at the specific proposals under Google’s new “Privacy Sandbox” umbrella and talk about what they would mean for the future of the web.
The good: fewer CAPTCHAs, fighting fingerprints
Let’s start with the proposals that might actually help users.
First up is the “Trust API.” This proposal is based on Privacy Pass, a privacy-preserving and frustration-reducing alternative to CAPTCHAs. Instead of having to fill out CAPTCHAs all over the web, with the Trust API, users will be able to fill out a CAPTCHA once and then use “trust tokens” to prove that they are human in the future. The tokens are anonymous and not linkable to one another, so they won’t help Google (or anyone else) track users. Since Google is the single largest CAPTCHA provider in the world, its adoption of the Trust API could be a big win for users with disabilities, users of Tor, and anyone else who hates clicking on grainy pictures of storefronts.
Google’s proposed “privacy budget” for fingerprinting is also exciting. Browser fingerprinting is the practice of gathering enough information about a specific browser instance to try to uniquely identify a user. Usually, this is accomplished by combining easily accessible information like the user agent string with data from powerful APIs like the HTML canvas. Since fingerprinting extracts identifying data from otherwise-useful APIs, it can be hard to stop without hamstringing legitimate web apps. As a workaround, Google proposes limiting the amount of data that websites can access through potentially sensitive APIs. Each website will have a “budget,” and if it goes over budget, the browser will cut off its access. Most websites won’t have any use for things like the HTML canvas, so they should be unaffected. Sites that need access to powerful APIs, like video chat services and online games, will be able to ask the user for permission to go “over budget.” The devil will be in the details, but the privacy budget is a promising framework for combating browser fingerprinting.
Unfortunately, that’s where the good stuff ends. The rest of Google’s proposals range from mediocre to downright dangerous.
The bad: Conversion measurement
Perhaps the most fleshed-out proposal in the Sandbox is the conversion measurement API. This is trying to tackle a problem as old as online ads: how can you know whether the people clicking on an ad ultimately buy the product it advertised? Currently, third-party cookies do most of the heavy lifting. A third-party advertiser serves an ad on behalf of a marketer and sets a cookie. On its own site, the marketer includes a snippet of code which causes the user’s browser to send the cookie set earlier back to the advertiser. The advertiser knows when the user sees an ad, and it knows when the same user later visits the marketer’s site and makes a purchase. In this way, advertisers can attribute ad impressions to page views and purchases that occur days or weeks later.
In theory, this might not be so bad. The API should allow an advertiser to learn that someone saw its ad and then eventually landed on the page it was advertising; this can give raw numbers about the campaign’s effectiveness without individually-identifying information. The problem is the impression data. Apple’s proposal allows marketers to store just 6 bits of information in a “campaign ID,” that is, a number between 1 and 64. This is enough to differentiate between ads for different products, or between campaigns using different media.
On the other hand, Google’s ID field can contain 64 bits of information -- a number between 1 and 18 quintillion. This will allow advertisers to attach a unique ID to each and every ad impression they serve, and, potentially, to connect ad conversions with individual users. If a user interacts with multiple ads from the same advertiser around the web, these IDs can help the advertiser build a profile of the user’s browsing habits. The ugly: FLoC
Even worse is Google’s proposal for Federated Learning of Cohorts (or “FLoC”). Behind the scenes, FLoC is based on Google’s pretty neat federated learning technology. Basically, federated learning allows users to build their own, local machine learning models by sharing little bits of information at a time. This allows users to reap the benefits of machine learning without sharing all of their data at once. Federated learning systems can be configured to use secure multi-party computation and differential privacy in order to keep raw data verifiably private.
The problem with FLoC isn’t the process, it’s the product. FLoC would use Chrome users’ browsing history to do clustering. At a high level, it will study browsing patterns and generate groups of similar users, then assign each user to a group (called a “flock”). At the end of the process, each browser will receive a “flock name” which identifies it as a certain kind of web user. In Google’s proposal, users would then share their flock name, as an HTTP header, with everyone they interact with on the web.
This is, in a word, bad for privacy. A flock name would essentially be a behavioral credit score: a tattoo on your digital forehead that gives a succinct summary of who you are, what you like, where you go, what you buy, and with whom you associate. The flock names will likely be inscrutable to users, but could reveal incredibly sensitive information to third parties. Trackers will be able to use that information however they want, including to augment their own behind-the-scenes profiles of users. Google says that the browser can choose to leave “sensitive” data from browsing history out of the learning process. But, as the company itself acknowledges, different data is sensitive to different people; a one-size-fits-all approach to privacy will leave many users at risk. Additionally, many sites currently choose to respect their users’ privacy by refraining from working with third-party trackers. FLoC would rob these websites of such a choice.
Furthermore, flock names will be more meaningful to those who are already capable of observing activity around the web. Companies with access to large tracking networks will be able to draw their own conclusions about the ways that users from a certain flock tend to behave. Discriminatory advertisers will be able to identify and filter out flocks which represent vulnerable populations. Predatory lenders will learn which flocks are most prone to financial hardship. FLoC is the opposite of privacy-preserving technology. Today, trackers follow you around the web, skulking in the digital shadows in order to guess at what kind of person you might be. In Google’s future, they will sit back, relax, and let your browser do the work for them.
The “ugh”: PIGIN
That brings us to PIGIN. While FLoC promises to match each user with a single, opaque group identifier, PIGIN would have each browser track a set of “interest groups” that it believes its user belongs to. Then, whenever the browser makes a request to an advertiser, it can send along a list of the user’s “interests” to enable better targeting.
Google’s proposal devotes a lot of space to discussing the privacy risks of PIGIN. However, the protections it discusses fall woefully short. The authors propose using cryptography to ensure that there are at least 1,000 people in an interest group before disclosing a user’s membership in it, as well as limiting the maximum number of interests disclosed at a time to 5. This limitation doesn’t hold up to much scrutiny: membership in 5 distinct groups, each of which contains just a few thousand people, will be more than enough to uniquely identify a huge portion of users on the web. Furthermore, malicious actors will be able to game the system in a number of ways, including to learn about users’ membership in sensitive categories. While the proposal gives a passing mention to using differential privacy, it doesn’t begin to describe how, specifically, that might alleviate the myriad privacy risks PIGIN raises.
Click here to read the complete article
It's interesting that most of their plan to protect privacy is to build tracking into everything. I guess if we don't have any then there's nothing to protect.
Surely there's nothing wrong with letting an advertising company protect us from advertiser (and other) tracking.
Posted on Rocksolid Light