pinit_fg_en_rect_red_20 GEN - Sharpening Our Claws: Teaching Privacy Badger to Fight More Third-Party Trackers

The latest release of Privacy Badger gives it the power to detect and block a new class of evasive, pervasive third-party trackers, including Google Analytics.

Most blocking tools, like

uBlock Origin, Ghostery, and Firefoxs native blocking mode (using Disconects block lists), use human-curated lists to decide whether to block or allow third-party resources. But Privacy Badger is different. Rather than rely on a list of known trackers, it discovers and learns to block new trackers in the wild. It works using heuristics, or patterns of behavior, to identify trackers.

Last week, we updated Privacy Badger with a new heuristic to help it identify trackers that have flown under its radar in the past. Heres how it works.

What makes a tracker a tracker?

All tracker-blockers have to grapple with a fundamental question: what is a tracker, anyway? Often times, its obvious. When an ad network sets a third-party cookie and uses it to build a profile of your browsing history, its tracking you. But other times, its not so straightforward. Is a content delivery network (CDN) that serves static files across the web a tracker? What about a third-party image host? How do you decide what to block, and what to allow?

List-based blockers have humans make those decisions on a case-by-case basis. The idea is to keep a list of every single domain or URL that might be tracking you on the web. EasyPrivacy, the popular tracker list used by AdBlock Plus and others, has almost 17,000 entries. Creating a list like that requires tens of thousands of judgment calls by human beings, and maintaining it means making those decisions over and over again, for as long as the list is in use.

Privacy Badger takes a different approach. We try to define what tracking behavior looks like, so that any third-party domain that acts that way is likely to be a tracker, and any one that doesnt is likely to be benign. Then we let the extension decide on each request it sees for us. When new trackers appear on the web, or formerly benign companies start tracking users, Privacy Badger learns to block them without any help from us. If a company that doesnt intend to track its users becomes blocked anyway, it can adopt a legally-binding Do Not Track (DNT) policy that commits the company to respecting users privacy. Privacy Badger will then automatically unblock that companys resources.

This makes our choice of heuristics extremely important. Trying to write rules that will identify every single tracker on the Web, present and future, is a Sisyphean task. Tracking on the Web is always developing, and the most creative trackers will probably always be able to circumvent our detection. Our goal is to detect and block the vast majority of trackers that users encounter on a daily basisand to make the surveillance business model a little less profitable.

Cookie Sharing: third-party tracking at a first-party price

For most of its history, Privacy Badger has used three main heuristics to identify tracking behavior:

  • Third-party cookies. Cookies are the simplest and most common tracking tools on the web. Privacy Badger considers a domain to be tracking if it sets a third-party cookie with enough information to uniquely identify an individual user. In our own experiments, weve found that around 98% of the tracking activity identified by Privacy Badger uses third-party cookies.

  • Local storage supercookies. Third-party domains that can run JavaScript are able to set values in the browsers local storage, then retrieve them later to track user activity across sites. Because these values act a lot like cookies but can evade common cookie-blocking tactics, they are sometimes referred to as supercookies. Privacy Badger looks for reads and writes of lots of information to third-party local storage and marks those as tracking.
  • Canvas fingerprinting. Trackers can also use JavaScript to try to extract a browser fingerprint, a value that can uniquely identify your device without the use of stored values like cookies. Privacy Badger looks for some of the most common kinds of fingerprinting using the HTML canvas, and marks those actions as tracking.

These heuristics help Privacy Badger identify and block the majority of tracking requests on the web. But a while ago, we noticed that one particularly notorious data collector was evading our filters: Google Analytics.

Because Google Analytics doesnt use third-party cookies, local storage supercookies, or browser fingerprinting to collect data about users, it wasnt caught by any of Privacy Badgers existing heuristics. However, it is a silent passenger on a huge portion of the Web, and one that collects information about users and sends that data back to Google. Moreover, Google Analytics is included on nearly every popular human-curated block list, including EasyPrivacy and Disconnect. Any intuitive definition of tracking probably includes what Google Analytics does, but Privacy Badgers definition didnt.

What Google Analytics does make use of is cookie sharing. Cookie sharing is a tracking technique most often used by third-party analytics services. It can also help trackers sidestep restrictions on third-party cookies, like Safaris Intelligent Tracking Protection (ITP) and Firefoxs default content blocking.

It works like this: When you visit a website, the page loads a piece of JavaScript from a third-party server. That JavaScript runs in a first-party context and sets a cookie associated with the first-party domain, like Your browser allows the third-party JavaScript (running as part of the first-party page) to read and update the cookie. Then, the JavaScript sends off a request to the third-party tracker. Normally, cookies are automatically sent alongside requests, and the browser controls who sees what cookiesit wouldnt allow a first-party cookie to be sent to a third party like Google. However, since Googles script is able to access the cookie, it can stick the cookie value right into the request itself (specifically, into the query string portion of the request). Google receives the identifier from the first-party cookie and uses it to link the request back to a user profile.

A diagram depicting the way cookie sharing operates. First a user makes a request to a web server, ";" second, the server responds with a web page; third, javascript on the web page requests 1st-party cookies for; fourth, the "cookie store" responds with an identifying cookie ("id=abcd1234"); fifth, the javascript creates a tracking pixel; finally, the tracking pixel initiates a request to "," and the URL parameters include the identifying cookie "id=abcd1234".

Cookie-sharing trackers, including Google Analytics, often rely on tracking pixels to work. A tracking pixel is typically an invisible, 1x1 image that is placed on a web page for the sole purpose of triggering a request to a third party. That means most cookie sharing is undetectable to all but the most tech-savvy users.

On its own, cookie sharing is usually not as effective at tracking users as traditional third-party cookies. Because the first-party sites dont share cookie data with each other, a third-party tracker will associate the same user with a different cookie value on each first-party site the user visits. This makes it more difficult for the tracker to link data from different first-party sites to the same user. But if the tracker sees requests from the same user on two different first-party sites in rapid succession, it can use other identifying information, like IP address or TLS state, to link different cookie values to the same user. Google Analytics is present by some measures on as much 80% of the Web, so it gets information about nearly every site most users visit, and linking those identities together is a cinch. But dont take our word for it; Googles privacy policy says as much (emphasis ours):

Google Analytics relies on first-party cookies, which means the cookies are set by the Google Analytics customer. Using our systems, data generated through Google Analytics can be linked by the Google Analytics customer and by Google to third-party cookies that are related to visits to other websites.

Lets look at how this might work in practice. Imagine a user browses to a handful of sites over the course of a day, each of which has the same cookie-sharing tracker on it. The table below shows the information that the tracker gets from each visit.



Shared cookie ID

IP address

10:25 am


10:46 am


2:41 pm


2:55 pm


3:02 pm


On the first two sites the user visits, the tracker sets a first-party cookie in the users browser that is unique to the first-party site. The users ID cookie is abc123 on The Guardians website, and def456 on But because the user visits the two sites in rapid succession, their IP address remains the same, so the tracker can infer that the abc123 user and the def456 user are one and the same.

Later in the day, the same user gets back online and visits Newegg and Planned Parenthoods websites. The users IP address has changed, so the tracker doesnt know that the cba321 and xyz789 identities point to the same user as before. However, the user then re-visits The Guardians website, and the tracker sees another request from user abc123 coming from the new IP. This tells the tracker that the requests from Planned Parenthood and Newegg probably came from the same user, and lets it link all the recent activity its seen back to a single identity.

Detecting Cookie Sharing

In the latest update, Privacy Badger has a new heuristic to detect cookie sharing. Every time it sees a third-party request, it runs a series of checks:

  1. Is the request an image request? The majority of cookie sharing uses 1x1 pixel images, so we ignore other kinds of requests.
  2. Does the request URL have query arguments? These are usually used to convey extra information with an image request.
  3. Do any of the query arguments contain a long segment of a first-party cookie? This is what were really looking for. If any of the query arguments have a large chunk of information (8 characters or more) in common with any of the first-party cookies on the page, the request is probably trying to share a tracking cookie.

If all of the above conditions match, Privacy Badger logs the request as a tracking action.

After building the new heuristic, we tested it using Badger Sett, our in-house tool for scanning the web with Privacy Badger. We scanned the top 10,000 first-party websites on the Majestic Million and recorded the number of times each third-party domain was logged taking a particular tracking action. This allows us to see which new domains Privacy Badger will learn to block using the new heuristic, as well as to make sure it doesnt mark too many benign requests as tracking.

The table below shows the five domains that Privacy Badger newly identified as tracking on the most sites:

Tracking domain


Number of first-party sites domain was seen tracking on

Third-party analytics


Third-party analytics




Identity resolution


Analytics, market research


Google Analytics is by far the most common tracker identified by the new heuristic, but all of the top five are what we would consider trackers. Four of the five are included on the Disconnect blocklist (the list used by Firefoxs content-blocking feature): Google Analytics, Chartbeat, Nexac, and Amazons Alexa Metrics. The fifth, BounceX, advertises itself as a service to accurately recognize and market to the actual person behind every visit in real-time. Sounds an awful lot like tracking to us.

Track Changes

The techniques used by trackers are always evolving, so Privacy Badgers countermeasures have to evolve, too. In the process of developing the new cookie-sharing heuristic, we learned more about how to evaluate and iterate on our detection metrics. As a result, Privacy Badger is stronger than ever. When the next generation of corporate surveillance technology hits the web, well be ready.

Install Privacy Badger