Root Causes 227: Let's Talk About Cookies
In this episode we explain the fundamentals of cookies and why, despite their obvious benefits, they present troublesome privacy concerns. We discuss the many ways web users can be tracked including cross-site cookies, tracking pixels, and browser fingerprinting.
- Original Broadcast Date: May 27, 2022
Episode Transcript
Lightly edited for flow and brevity.
-
Tim Callan
We want to talk about cookies.
-
Jason Soroko
I really like - I love chocolate chip. So obviously, we are talking about browser cookies, Tim. They ‘ve been around for my very first browsing experience, there were no cookies. When I used Lynx, and it was just a text-based World Wide Web Interface, and I was telnetting into CERN way back in 1992, this didn’t exist. Cookies didn’t exist.
-
Tim Callan
Sure. Cookies are a World Wide Web thing. If I had to hazard a quest, I’m going to gamble here and then people are going to yell at their screens that I’m stupid. I bet you that that was a Netscape invention.
-
Jason Soroko
Actually, your guess was absolutely, probably better than a guess. It was right on the nose.
In 1994, you went into Netscape and so, why are they called cookies? Think about a fortune cookie. That’s actually where the idea comes from. Which is the idea that it’s something where a message is embedded or a piece of data is embedded. Just like a fortune cookie.
That’s where the term apparently actually came from. The idea of a cookie or a magic cookie, which is what it was called, the idea of a piece of data, a packet of data, that was created, sent and received without changing it, was actually used in Linux for other purposes even before that idea was brought to Netscape and then became the cookie that we know and love, and sometimes not love, today. That’s kind of the history, and I think that’s important to make people understand, but what, there’s a lot of types of cookies. Now they all are formed very, very basically. A cookie contains a name, the cookie’s name, and then there’s a value, and all cookies are based on these pairs of name value, name value, name value. And, of course, you can have a cookie name like just some block of text, and then the value could be – this is the domain that you're at, or this is a piece of data that’s hashed, something or other that’ll allows us to understand from a database entry that we’ve just created that something is in your shopping cart. Therefore, let me just spell it out. Cookies were put into browsers because the World Wide Web essentially is stateless. Every time you make a request of your browser, as a client makes a request to a webserver, it is essentially stateless if you go down far enough into the technology. In order to have something like a shopping cart, which means you’re browsing away, and it remembers the fact that you’ve chosen certain things or the fact that you’ve - and this is where it gets close to us, close to our hearts, Tim – authentication.
-
Tim Callan
You are who you are. This is really your online shopping account this is really your social media account; this is really your bank account.
-
Jason Soroko
There are session cookies, authentication cookies, and remember, from a security standpoint, that can be vulnerable to session hijacking, just an asterisk there. You also have things such as persistent cookies. In other words, the particular session that you’re on, a cookie might last well into the next day or days later, when you’ve completely forgotten that you’ve browsed to a certain site. You can have these persistent cookies. Those we also call tracking cookies. Those are the ones that have gotten cookies into trouble. There’s also a lot of other attributes to cookies that have been put in over the years. Things such as being able to have an attribute to the cookies such as saying, make this cookie only available to the same site. In other words, there was a real problem, Tim, with cross-site request forgery. In other to thwart that attack, the browsers had to get together and say, alright, we have to allow people who are building websites to say – see this cookie right here? If ever you see it, it has to only come from a specific origin, and that helped with that attack. Then, Tim, not only you, as a webserver software builder – in other words, you building a web app, what we would call it today, but you might be serving ads on your website, and those ads themselves might have cookies embedded within the banner ads. We would call those third-party cookies or banner ad cookies.
So, there are many other types. That’s a non-exhaustive list, but I think those are the ones interesting to us. I think what’s important now, Tim, is to just recognize. I don’t think it’s the normal session cookies giving cookies a bad name. It’s those persistent and third-party cookies that are problem.
-
Tim Callan
Exactly. Nobody’s upset about the fact that I go back to my online shopping account, and it knows who I am. That doesn’t bother any human being who I’ve ever met. If we didn’t want that, we could just disable cookies, and all these problems would go away, but who among us would be willing to live without that particular bit of functionality?
-
Jason Soroko
We absolutely don’t want to live without it, and so, cookies are probably in some form going to be with us for an awfully long time. But let’s enter the cookies are evil, because of the fact that your privacy browsing around the Web with your browser is under attack. Let’s face it. It’s not just your – not malicious advertisers that are saying – you just browsed to buy some pants, maybe I’ll put some other pants in the next browsing session to you in a banner ad. Maybe you don’t like that, but there are even more nefarious things that are going on in the Web that attack your privacy, and, I’m going to avoid getting in depth in terms of what all those really terrible bad things are, but I am going to get into a little bit of depth in terms of what the reaction to those bad things are. The bad reaction against cookies had started a long time ago. But I’d like to specifically go back to 2002, where an EU law came out that says, look, you can only use cookies for – and I’m paraphrasing here this is my own language, to simplify it – we only want you to use cookies for your original intended purposes. In other words, shopping carts, authentication, basic session management. In other words, don’t do anything else with it. That tracking stuff, we just don’t want you do that. That was kind of a – the law was kind of written in a way where it was kind of ignored, and it wasn’t really enforced. It didn’t have a lot teeth, and I think part of the reason why it didn’t have a lot of teeth was people just didn’t understand how to enforce it, and therefore - -
-
Tim Callan
What you said to me was very much subject to interpretation. I understand that’s a paraphrase, but if it was rather general like that, then I think we would need to ask ourselves the question, who’s supposed to be the arbiter of whether or not this is working for its original intended purpose?
-
Jason Soroko
Exactly. So, therefore because it was just so much gray area and it was very, very difficult to interpret, therefore, difficult to enforce, it really didn’t go anywhere. But in 2009 – Tim, have you ever been in Europe or browsed to European websites? It’s a different experience, isn’t it? One of the things that pops up, I’d love to hear what you think about this, but I know when I’m browsing any kind of European website or even worse, when I’m in Europe, I go as far as to say it’s almost unusable. Even though I’m browsing for legitimate purposes and everybody is happy, like, I want to browse you, so please give me your information. I’ve got to go through what feels like this gauntlet of a wall of cookie allowance screens, opting in, opting out, and the sites that are not so great are the ones that don’t make it really clear to say, yes, I will accept your cookies. Please let me browse, or no, I’m going to reject all of them, just give me what you’re going to allow me to see if I reject. In 2009, the law became a lot more clear, still not perfectly clear, but it became a lot more clear that says, if you have a website in Europe, you must require consent for the storage of a cookie.
-
Tim Callan
That’s why we see those warnings, we all see everywhere.
-
Jason Soroko
What’s changed today? Because 2009 was a while ago. Like, we’ve been living with that for a while. So, the problem is this - so many websites have taken the spirit of, again it wasn’t super clear working or wasn’t enforced, there’s a number of issues that other people could debate. I won’t debate here. But what’s Europe is deciding right now is, oh my goodness, we actually need to word this in such a way where the consent needs to contain a mechanism that is ultra, ultra clear. Either yes, let me accept all your cookies, or give me an incredibly easy way to accept a set of your cookies or a button that says reject all, which apparently in Europe is the rarest thing of all. And so, Tim, I think this is going to end up being a bit of a series of podcasts because we should probably leave that there.
I’m going to get into two other topics that are related to cookies in a moment, but we’re going to talk about what the browsers have done because if cookies are going to be that difficult and a lot of people are just going to be rejecting all because of the fact that the European laws are going to change such that they have to make it easy to reject cookies, and therefore, maybe most people or enough people would reject all so that their business models might change. What is the industry reaction?
-
Tim Callan
This is worth talking about. There’s a famous quote, you are the product. A good rule of thumb is if someone is providing something to you for free that costs them money to produce, such as a social media site, there is a reason why. The reason why has to do with you as somebody who is being tracked or measured or sold to or something along those lines, and that is heavily enabled by cookies. If everyone universally rejected cookies and everyone in the world said I am not going to do anything that involves cookies, then those businesses fundamentally wouldn’t be able to exist, or if they did exist, they’d exist in a radically different way. If advertising had to be general and just poured out to the masses, rather than specific to your usage in some way or your demographics in some way, then that would cut down on the revenue traumatically because you wouldn’t be able to have this targeted advertising model. It’s huge and maybe I’m going to come across as a defender of this stuff here, but at the end of the day if somebody is going to put together something very expensive, like a social media business or some kind of online entertainment business or something like that and they do have to fund it financially and that has to be, let’s say, through advertising, then they are going to pursue the strategy that is most expedient for them to sell the advertising for what they need to operate the site. And your part of the bargain for getting this thing for free is that there’s going to be some kind of tracking built in. If some of us individually can opt out of that bargain, if a small enough number of people do it, that’s fine, but you do kind of get into this prisoner’s dilemma situation. Where, if everybody does it, then it kind of all falls apart for everybody. So, I think what you’re saying, if I may paraphrase, Jason, is the question is being asked, well how do we preserve people’s privacy without just destroying this whole construct that we’ve collectively come up with over the past 25 years. Is that right?
-
Jason Soroko
Absolutely, and it was all built on the back of the cookies. You said it so well. I don’t want to repeat anything you just said, because it should just stand as is. Keep in mind all of us, all of us listening to this podcast, we’re all, even the oldest of us listening here, all have been marketed to for the entirety of our lives. Even a lot of it isn’t the problem. It’s not a problem. I wish the commercials on TV were maybe a little bit shorter, but in reality, I know that a lot of that good content is because it’s being paid for by somebody else. That’s great.
Tons of business models are based off of this. So, we do not want to throw out the baby with the bathwater. In order to make this podcast even juicier, Tim, and not make it just about something just so mundane like cookies, I want to talk about two more topics quickly.
And that is, there are other technologies out there that do similar things - -
-
Tim Callan
That essentially do the same thing.
-
Jason Soroko
And that specifically is, let’s bring at least one up, tracking pixels, tracking pixels. Have you – you’ve come across, haven’t you, Tim?
So, tracking pixels to me, I first became aware of that when I was actually helping out a marketing department way, way, way back in the day, and the idea was if you open up an e-mail there’s a snippet of code in the HTML of the e-mail that actually downloads a single pixel. That single pixel was defined as such that the request of that download would notify the marketer - -
-
Tim Callan
That the e-mail had been opened.
-
Jason Soroko
The e-mail had been opened. You got it. Therefore, if you think about this now you have a lot of banner ads, etc. These things also contain those same kinds of pixels. In other words, what marketing people use them for is to measure campaign performance. If you think about it, we’ve all had this experience where you go shopping for X. A pair of pants. Well that pixel in the checkout could be combined with your browser profile, and we’ll talk about what browser profiles, what browser fingerprintings are in a moment. That says – you're the type person who likes to shop for clothes or pants, very specifically. Therefore, when you browse to a social media site, your social media site will be notified by a cookie that will be installed saying – this person should see ads for X, a pair of pants, and therefore, your browser fingerprint, part of your browser fingerprint, which in totality will include cookies and history of pixel downloads. You as a person can be profiled and targeted to more effectively. The benign part of that – there’s really two benign parts of that. One is, of course, just simple advertising. Making advertising more effective. That’s not necessarily a terrible thing. But there’s also something else, Tim, which is, okay, I’m going to me banking site, and I bank with a particular browser fingerprint day after day after day after day. Every time you see me, you’re probably seeing the same thing. All of sudden, something strange occurs, an authentication occurs and I have a different browser fingerprint. Maybe that’s not me, and maybe that will cause the bank to want to say – is that really you and might have step-up authentication, as an example. That’s another benign reason to do browser fingerprinting.
Let’s talk about, and this is the final bit here, Tim. Let’s talk about browser fingerprinting for a moment because I don’t think people realize just how deep this goes. Even if you got rid of cookies, browser fingerprinting, the science of it has gotten to the point where you could even be using a banner ad blocker. An advertising blocker extension, of which there are many extensions, in your browser and you could still come up completely unique and be able to be fairly uniquely identified. Let me give you an example. A lot of people think that a browser profile is your IP address. That’s one, maybe one data point. It’s not even one of the great ones. A lot of people who are technical enough to know that when your browser makes a request, there’s something called a user agent that you actually give to the website that you're browsing to, and it’s part of the request header, and that’s read and it contains information like you’re using Safari, or you're using Chrome, what version, etc. There’s a user agent. But again, that’s not terribly, terribly unique. You and I, Tim – if we were both using Safari on the same day, we might have a very similar user agent, especially if we were all up to date, etc. Keep in mind, when you are browsing, you are also giving up information such as what’s your primary language set to? In my case, it would be English; what’s my time zone; what’s my operating system? But what a lot of people don’t realize is, well, there are things such as your screen size, your color depth, your system fonts, because web servers want to know how to serve you information, and you're giving up that information through your browser. Additionally, there’s something called a DNT header that you might, as part of your request, say – do not track me. Believe it or not, that also is a – I’m going to introduce this idea, Tim – a bit of entropy that uniquely identifies you because not everybody has the DNT header on. I do, but a lot of people don’t. And that actually more uniquely identifies me.
I’ve got one more little piece to this because I’m going to show you how deep it goes. There’s something out there called a Fingerprint JS Library, short for Fingerprint JavaScript Library. What this does, it actually runs in the background, kind of unbeknownst to you, it’s almost like that single pixel idea. It will run an HTML 5 Canvas or a Web GL function, which essentially is, I’m sure you’ve been to websites, Tim, that look really elaborately fancy and beautiful graphics, etc. Well, if that were cut down to its absolute minimum, HTML 5 Canvas and Web GL also will give a webserver information about your GPU, your graphics drivers. Information that ultimately becomes hashed as what’s known as the canvass fingerprint or the Web GL fingerprint, all the way down to, like, your graphics driver version number. It’s incredible.
-
Tim Callan
Maybe you’re not truly unique, but the level of intersection of exact matching between two systems has to be phenomenally low.
-
Jason Soroko
Tim, absolutely. All of these things become bits of entropy, to determine your uniqueness. It even goes as far, Tim - of course, webservers also deal with audio, and therefore, your audio information is fingerprinted. Your hardware concurrency.
-
Tim Callan
You mean, like, my output device, my volume level, things like that?
-
Jason Soroko
Tim, web browsers can ask for that information and browsers will give it. Here’s something interesting, Tim. Let’s leave it here for now. I invite everybody to check out, and, again, I’m not an advocate; I’m not a detractor; I just merely am giving information here. There’s something called coveryourtracks.eff.org. So, EFF is a privacy advocacy.
So, a bit of shoutout to them just because I found it very fun to go to that site because they’ll tell you how you unique you are with your browser. They’ll tell you whether you trackable. They will calculate your entropy based on your computer browser combination, and they’ll tell you the amount of bits of entropy that you actually are giving out publicly when you are browsing to websites. So very interestingly, when I did this, Tim, it told me that out of the greater than 200,000 browsers that the EFF had tested in the past 45 days, I was completely 100% unique and therefore, utterly trackable, which I found very interesting. -
Tim Callan
I was going to guess that for your typical website, probably everyone or damn near everyone who goes there is uniquely trackable.
-
Jason Soroko
When you add up all the bits of data, because – you and I might be running a MAC. But we obviously don’t have the same – we might even have the same audio driver. We might just have that, but for all those other things including our IP addresses, to be unique, forget it. You’re not going to be completely the same on every single one of the bits of information. So, there it is, Tim. I just wanted to give the importance of the history and background of cookies, other ways that you're tracked, and ultimately, that it comes down to, which is, is your tracking anonymous? Can you be uniquely identified? The answer is, absolutely. And that has really big implications for identify, which is a big topic for this podcast, but then I want to talk about, alright, now that we know the EU is cracking down on this but also doesn't want to throw out the baby with the bathwater, what is the industry reaction? I think that might cover at least one to two more podcasts, Tim.
-
Tim Callan
I think that’s a great starting point. It’s a great level set. I can start to imagine some of the abuses that can come if you really could say I can uniquely identify any given machine. Then you start to imagine, well, if I know that this machine that was on the site is the same as that machine that was on that site you can start to put together pieces and essentially violate people’s privacy, so that does seem like – you see why people would be concerned about that. You see in particular why the EFF and the EU would be pushing on that as two entities that care very much about these things, and, let’s leave it there, and I’d love to come to the next session, and we’ll talk about the response.
-
Jason Soroko
Standby. There’s actually a lot more interesting things that are going on now about this, Tim. It’s funny. We’ve been living with this for since 1994, and all of a sudden, things are coming to a head, and it’s going to change the way we browse. Stay tuned.