Cross-domain Tracking: An Overview

Back in the day
So to get started, I should address what advertising online used to be, and what it often is still thought of.

Back when the web was still a few years younger, advertisement used to be done by having a web site owner contact or be contacted by a publisher, someone who wants to display their advertisement. The publisher would offer a given amount of money to the website owners in exchange for the advertisement to be displayed (called an impression by advertisers).

There were (and still are) multiple ways to choose how to pay for an ad when it's displayed:

Cost Per Mille (CPM): A given amount of money is given for every 1,000 impressions Cost Per Click (CPC): A given amount of money is given every time a user clicks on an ad Cost Per Action (CPA): A given amount of money is given every time a user performs a given action (such as filling a form, buying a product, etc.) after seeing an ad. A one-time sum for some design alteration: Microsoft used to do this, for example, when they released Vista after the long pause that followed XP and its service packs. They'd go to a website and offer a given amount (say $80,000 per week, if you're a big website) to have your website's design changed to reflect Vista branding. This could be tracked by each advertiser individually, and deals could be struck between marketing or sales teams.

So one problem of this approach was technological: not all advertisers have the means to do the tracking required for all these metrics, and most people only want to say "this is my ad, show it with this budget and this intent".

Eventually things led to having ad agencies and marketing agencies who would take care of acting as a liaison between advertisers and websites, who took care of all the complex technical stuff, and would be a bridge in there. They could give the volume of advertisement websites required for all their page views, and they could give the reach advertisers wanted to have.

Primitive Ad Exchanges are Born
My timeline is a bit shaky there and a lot of these things happened more or less organically. Eventually, names like Yahoo! figured out that you could make everything work like a stock market or an auction system if you had systems fast enough with low enough latencies.

The general idea was that instead of having pre-made deals with given rates and whatnot, it would be far more interesting to have websites put on a piece of script on their website, which directs the user to an ad network's script (see: Yahoo, AOL, Google, Facebook, OpenX, AppNexus, etc.).

The ad network's script then sees that a given user is on the page and sends an event to a bunch of buyers (called bidders) on their market. The event will contain information such as "Someone visited website at URL X." The bidders could then decide how much they thought the advertisement for this specific case was worth, could send back a given amount (often in picodollars).

After a limited period of time (usually below 100 milliseconds), the ad Exchange would look at all the bid responses it had received, and picks a winner (often using the second-highest bid price, to be paid the highest bidder).

Ad Exchanges Grow
And this is where online advertisement could boom. Because there now was a way to reach a lot of potential buyers, and potential buyers had a large inventory to pick from with increasingly more accurate needs, things started growing more and more. The following capabilities ended up present in the systems (some were always there, some not):

The user is identified by some random ID by the Ad exchange, and the ID is relayed to bidders, letting them anonymously track who they have or have not seen. Sometimes the exchange has information on the bidder in the form of geolocation data (either taken from an IP, some geo-location support where the user opted-in, etc.), language data (submitted by the browser on every request), device data (using the user-agent a browser sends, you can figure out if the user is using a mobile device or a computer, with which browser) and so on. The page the user is visitting may come with information about its content. For example, its verticals (broad topic: news, cars, sports, etc.), whether it's an adults or porn page, etc. Other similar information, but usually nothing that can identify Some ad exchanges had different levels of content available, with varying sources of information. This led buyers to look for multiple ad exchanges. In response, this made things more and more complex for the buyers, so we started seeing businesses that were specialied in buying and bidding on advertisement spots. Now the chain looked a bit like this:

[Website] <--- [Ad Exchange] <---> [Bidder] <--- [Advertiser]

Where there are many websites and many exchanges. In some cases, the Bidder and the Advertiser were represented by the same Marketing firm. Everyone down the pipeline takes a small tiny cut. At some point in time, someone figured out that because your customers all have a website, that website could be used as a source of information:

[Website] <--- [Ad Exchange] <---> [Bidder] <--- [Advertiser] <--- [Client's Website]

So came the idea to enrich the bidder's decision with the many clients' websites' data. If your client website could define that your user enjoyed cars (because they're a car dealership), and that you knew that the end Website was about cars, you could figure out you'd have a good chance of making an effective advertisement sale about cars from your clients to the user on that website right now.

Even nicer? If you knew the user was looking for a car at your client's site, and you see they're still browsing more car data, they're probably actively looking for a car, so you should pay more for a very targetted ad campaign. This is called retargeting. These are the usual types of campaigns:

Awareness: show as many ads as possible to as many people as possible. The objective is to get your brand or product known. This is a spray and pray model, with cheap bid prices to get more impressions for the same budget. This will usually called optimizing for the Cost Per Impression (CPI) Retargeting: grab a user who is potentially interested. Bidders pay a lot more for these, but also have a lot less volume to go with. Optimizing for Click-Through Rates (CTR): trying to balance budgets where you end up paying the least amount for the most amount of clicks on advertisements Others So how does that take place?

Cookie Exchange
Cookie Exchange is where the information gets traded. Rather than having dirty backroom deals where advertiser A contacts advertiser B and says "I have information on all these users, let's synchronize our databases", cookie exchange takes place on a more 'organic' (still forced) level; it only takes place when the user visits a given page.

This is done in two parts.

1. Between the Customer's Website and the Advertiser/Bidder

Whenever a user visits a customer's website, the user may or may not have a cookie in place. If I'm visiting a dating site, I may have a session cookie which is exclusive between the website and myself. If I remove it, I get logged out.

The cookies in browser are strictly restricted to specific domains, so that if I'm dating-site.com, the cookie cannot be read by dating-site-advertiser.com. Instead what the dating site will do is decide to say to their advertiser "okay, I have the following information on a user: gender, sexual orientation, age group, lists of things they like, a given id, geographical data. What do you need?"

The advertiser can decide to pick only stuff like gender, age group and geo data, or may grab more than they need. This is usually a case-by-case basis, and the bigger the advertiser, the harder it is to be super specific about what special-grab information there is to be. So it's somewhat unlikely that the bidder will look for things like "favorite ice-cream brand" or "what kind of car do they buy" and stuff like that. They'll go for broad definitions that work better with their data analytics and models.

Once that's kind of agreed for, the website will put on a bit of code on their website that will contain the information to be shared. They will call to display a 1x1 image that is transparent with an URL like dating-site-advertiser.com/pixel.png?user_id=1309123&agrp=20-35&r=ca-qc&... and so on.

The advertiser sees this, stores the information, and then, along with their pixel (to display it), attach their own cookie that may contain information as basic as "ad-exchange-user-id:403232423". There needs to be little information there, the matching is then done in a database, behind closed doors.

So the bidder has just established a link between themselves and how they track a user, and where they're from and what kind of data they have.

2. Between the Advertiser/Bidder and the Ad Exchange

Whenever a bid request makes it to the bidder, and the bidder wins the auction, the bidder can then display an advertisement. The advertisement will usually be an image, a flash container, a bit of text, etc.

If you recall, ad networks often submit a non-identifiable user id with the bid requests. What you can do is decide to remember which requests you bid on, see when you win, and when you win, create a matching:

Did the user 9392131 from network X have a cookie with us? Yes! It was user 403232423! We can now know that 403232423 is also X:9392131 and know when we see that person again in the future. Did we not know him? Well let's give that guy a cookie all the same, and now we have a way to match their data internally. And now that same mechanism can be used with the customer website to further create more accurate identities.

Now that's kind of annoying to track all the user IDs of all the stuff you bid on. Ad networks are high-volume enough that you see billions of requests a day, with hundreds of thousands of bids attempted, but maybe a few hundreds to many thousands that win. Instead of tracking all that stuff by hand, Ad Exchanges have cookie exchange services.

Basically, instead of displaying your ad directly and having you do the matching, they'll do that part themselves.

The trick is that to display an advertisement, a website will put a snippet of code (say ), which will load the code required to launch the whole auction system.

When someone wins, the ad exchange is the one to pick how to inject and display the code in the page. Therefore, they're able to do a thing such as including the winner with:

somewinner.com/adcampaign/submitted-image.jpg?price=2103912&...

And other details that are usually signed cryptographically (this is the only way to know if you won an ad -- eventually people started gaming these and trying to create fake ones or replay them to cause trouble for their competing buyers. It's a dirty game).

The Ad Exhange will, for purposes of cookie tracking, either include their own user id in the winning ad, or add a specific pixel from your website along with the advertisement. I'll be brief on the details there, but the pixel can be prefered because ad companies will build extensive services that will be more or less generic to work with multiple ad networks and whatnot

In any case, we have defined broad cookie exchange, and it's now possible to come full circle, and identify a user on a client's website along with their identity on one or more ad exchanges.

Other Usage of Cookies
Surprisingly enough, the cookie exchange I mentioned has one big objective, and a bunch of minor ones. The minor ones tend to include:

More accurate (anonymous) user data Better source ofr decision making and advertisement Ability to track click-through rates and do retargeting more efficiently and cheaper The big one, the major one is:

Limit how many times you show an ad to someone. This is a funny one, but the most important use of cookies from advertisers is to know when to stop showing you ads. This is the most compelling argument for online advertisement.

Compare online advertisement with traditional one. You'll know sports people will watch a hockey game or read hockey magazines. You'll know people are more affluent if you advertise in an aircraft or airport. You'll know people are into food when you go on specific food channels on TV.

This kind of accuracy and profiling is more or less the same online. You have far more specific events, but generally, the profile and market information is on par (if only slightly better) than what you get through traditional media.

Online mostly makes it easier to know if the user acted on the advertisement, but more specifically lets you know how often you displayed a given ad to a given user.

The pick up truck advertisement I've seen 500 times over the last 2-3 seasons of watching the Canadiens? After 2-3 times, I'm pretty sure the message has gone through and all other times they paid for that was wasted on me.

Online? You can decide to show the ad at most 5 times (for example), as long as the user doesn't clean their cookie. This gives you a spending cap and allows a far better budget allocation of things.

The gotcha? This is the part of cookie tracking advertisers will never want to give up again, and it fully depends on cookie exchange with the ad networks.

How Things Look Nowadays
So a few paragraphs earlier, I had this drawing:

[Website] <--- [Ad Exchange] <---> [Bidder] <--- [Advertiser] <--- [Client's Website]

In practice, things may become more and more crazy, where many ad exchanges contain the data from many other ad exchanges, and ad exchanges and bidders may enrich data with specific analytics engines that crawl the web and try to categorize websites into specific verticals and gather information about the quality of the content and demographics that go to specific website. The whole thing becomes really really complex and looks more like:

[verticals service] | [Website] <--- [Ad Exchange] <---> [Ad Exchange] \ | __________/ \ | / '---[Advertiser] <--- [Client's Website] / | \ [bidder] | [bidder] / \ [verticals] [analytics] If this looks weird in the forum, here's an image: http://i.imgur.com/L0Cxyby.png

And so on. Advertisers may eventually evolve to be their own ad exchanges.

What's "fun" is that when you get many ad exchanges like that, it becomes possible to have multiple rounds of cookie exchange.

Ad Network A will display the ad won by a bidder on Ad Network B, which saw its ad won on Ad Network C, which won on behalf of Bidder 21021.

There will then be a chain of cookies exchanged by all advertisers. Network A only knows about network B and will exchange its cookies with it. Network B will exchange its own cookies with network C, and C will exchange them with the bidder.

Usually ad networks self-regulate to say that you cannot publish the user ids and cookies to a third party (and they can verify by subscribing to your exchange to ban you from theirs). In practice this seemd to work.

All The Cookies You May Have
So here's the fun stuff. Because of how complex the dependency graphs between advertisers, their competition, their nested ad exchanges and whatnot get, it becomes absolutely possible that you can be the participating to the ad exchange of one of your customer's websites on behalf of another customer, and only display information there, on top having a partnership with them as their advertiser.

That means that you can get all kinds of tracking when you're a user visiting a site:

My session data The cookie between the website and its advertiser The cookies between you and the website's ad network The cookies between you and every single one advertisement winner, which may be: Another network Specific bidders Specific companies or individuals The total set of which can overlap, in that specific bidders can also be advertisers. Most of them will be for tracking purposes, but the only ones in direct control of the website will be:

Your session data The cookie between you and the website advertiser Everything else comes mandatorily with participating with an ad exchange. The information submitted by the website is mostly null (it's obtained by the ad exchanges and specific networks), with some specific exceptions. For example, Facebook has its own ad network, and is able to correlate all your personal information with a profile that they can ship to advertisers on each bid request.

The profile is anonymous, but as we know, that kind of information may be correlatable to specific people over time. Google can do similar stuff with their data gathering -- it's likely why they unified all their domain names a few years ago, forcing you to use the same account for gmail and for youtube and google plus, while unifying their privacy policies into one.

All of these major ad networks usually tend not to use evercookies and other shady practices. This tends to limit the impact evercookies can have in that the middlemen between a specific advertiser and the user will have their cookies time out, and they'll act as a wall between a visit on a website and an advertiser gathering the information, by making it impossible to match an ad exchange user without paying to win a specific bid request first. The only way the advertiser gets to renew their cookie data is to pay to display an ad on a user they ended up knowing about.

The current 'state of the art' for privacy control has been to shift it to the user's browser. Third-party tracking can be disabled there, and good advertisers and bidders will honor them. The mechanism is often known as "Do Not Track".

The 'problem' with these is that companies like Microsoft decided to make them 'opt-out' rather than 'opt-in' like other browsers, meaning the advertisers decided to go "you know what, this is not a user decision, therefore we disregard it". Bigger ad organizations in the industry tend not to require honoring that one.

Hopefully that was helpful.

Additional Sources
http://ferd.ca/rtb-where-erlang-blooms.html (this is my blog, content similar to conference talk) https://developers.google.com/ad-exchange/rtb/cookie-guide http://appnexus.com/cookies http://en.wikipedia.org/wiki/Click-through_rate http://en.wikipedia.org/wiki/Cost_per_impression http://en.wikipedia.org/wiki/Do_Not_Track

Addendum
http://arstechnica.com/security/2014/10/verizon-wireless-injects-identifiers-link-its-users-to-web-requests/

Cases like this, however, are pretty much non-standard and a different version of 'third party'