Redirecting you to
Podcast Jun 12, 2020

Root Causes 99: AddTrust Root Expiration Explained

The recent expiration of Sectigo's AddTrust legacy root caused some systems to stop working and forced some admins to keep working over the weekend until all was fixed. In this episode we explain roots, root expirations, why they are a non event for most users, and why sometimes an expiration can be more impactful.

  • Original Broadcast Date: June 12, 2020

Episode Transcript

Lightly edited for flow and brevity.

  • Tim Callan

    So, today, we want to talk about some big news that went on very recently around the AddTrust legacy root expiration.

  • Jason Soroko

    AddTrust. So, that was a legacy root. Is that right, Tim?

  • Tim Callan

    Yeah. So, that was a root that was owned and operated by us, Sectigo, that came up for expiration. It came up for expiration on May 30 and got a lot of attention on social media and made some headlines and so, I think today we want to explain why roots expire. What happened and maybe what people can do in the future to ensure that their enterprises are more proofed against problems and errors that might come from root expirations.

  • Jason Soroko

    Sure, Tim. So, you and I talk quite a lot about revocation.

  • Tim Callan

    Yeah.

  • Jason Soroko

    Usually what we are talking about are leaf certificates and entity certificates, whatever you want to call them. It’s basically the certificates that are used by the user, the web server, the device, etc. but in this case, we are talking about basically the key material that is related to the actual root itself.

  • Tim Callan

    Yeah. So, when we say certificates, most people think about that thing that gets issued that somebody puts on a server somewhere in the world. And those are certificates, but there are other types of certificates as well. And the two that immediately come to mind are root certificates and intermediate certificates. So, what happens is, you know, why does trust happen at all in a client system. Well, so when you issue a certificate, that certificate is cryptographically bound to something called a root and the root is controlled by the certificate authority. By the company that’s issuing that certificate originally. And because of that, because of that cryptographic binding you can have assurance that the end certificate really came from the authority that controls that root. Now, in the world of private PKI, which is what PKI originally was, that was whoever owned the network. So, if you went back in time to 1989 you might have a corporate network and in your corporate network you would issue certificates and you were the certificate authority. And so, where that trust came from was that the CA was the person who was responsible for that network and so that was very clean. The other thing that was very clean about that was you had control over the whole system. I own the clients. I own the servers. I can change it or swap it out or do whatever I want. When the public internet came along, all of that got a little more complicated because now what we were gonna do is we were gonna issue certificates to systems that were not controlled by the people who ultimately wanted to establish identity on their servers and that’s where the root system, the modern root system that we have - - or not the modern, the original system that we have came in. So, what would happen is a public CA like Sectigo, though Sectigo was not a CA back in those days, but a public CA like Sectigo would get its roots embedded in these systems that were shipped out. So, you would go to let’s say Microsoft’s root program and you’d meet their requirements, and they’d say, ok, you’ve met our requirements, whatever those were. Those are documented. Now, I’m going to start shipping systems with my roots in it and then when your servers connect to those systems the roots will be trusted. Right? The certs will be trusted because that root is sitting there locally on that system. And then there was a system for manually installing roots, which didn’t happen very much. And there’s trouble with that. Right? That trouble with that is that you can’t really make new roots very well. If I create a new root today and I get it onto computers tomorrow, then that’s the earliest day that people can buy a computer that will actually connect successfully to my certificate.

    So, the way we solve that in the long term is that all got solved with auto update. Right? Once we got to the point where systems were routinely auto-updating, we could auto-update roots. So, if you have a Windows or a Mack desktop or you have an iOS or an Android phone or Windows phone, those are getting auto updates on a routine basis. And part of what they are getting is they are getting new roots. Right? So, if I want to create, if I want to become a CA today and I want to join the Firefox root program, the Mozilla root program, I go, I meet Mozilla’s requirements. Mozilla says you met my requirements, they add the root to the root program and then when the Firefox auto update rolls around those Firefox installations that are already out in the world get the new root. So, I get backward compatibility on those roots. So, you understand where that’s a flexible, agile robust system. But the problem is, what about old systems? What about systems that were in use before we had auto updates. Right? So, once upon a time, I didn’t have this stuff and the auto update came in in the late 2000s and in that kind of timeframe I could start to rely on this. But there are systems around from before that where they just won’t connect and the response to that is legacy root programs.

    So, a public CA like Sectigo can take one of these old roots that got embedded into these systems a long time and they can basically cross-certify to it. So, what happens is the cert on the machine tries to go up and follow the modern root up to the trust level. But if that root isn’t available, let’s say because it’s an older machine, if the certificate itself is cross-certified to a different root, an older root that may be on that machine, what we call a legacy root, then it will follow that chain up and it will cross-certify to that as well.

    So, it’s common practice to support older systems, it’s common practice for public CAs to cross-certify to roots that span back to before there was auto update. And all else being equal, the further back the better. Right? Because there are more people who can connect fewer people who are locked out. And so, Sectigo had one of these roots. It was called AddTrust. It was established in the year 2000 with a 20-year lifespan and that certificate was there as a fallback and alternate. Sorry, that root was there as a fallback and alternate to help older legacy systems that didn’t have our modern Comodo root help them still establish trust and that certificate expired on May 30, 2020.

  • Jason Soroko

    Right, Tim. So, the consequences though of that are that some systems may still be tied up to it or rooted up to it.

  • Tim Callan

    Correct. Yes. And you expect there to be some and you’d expect it to be a very low percentage of systems and it in fact, is a very low percentage. But, of course, there’s a large, large number of systems out there. So even a very low percentage, right, even something south of .01% can still in terms of numbers that the average lay person thinks about while we are walking around in our day, seem like a whole lot. And it’s even more complicated that because it isn’t necessarily just systems. It can be software. So, I may have written software that is looking for root and in itself has a trust root store. And if I’m running that software, regardless of what hardware it’s on, that could still be a problem and so that’s one of the things that we saw on May 30 is some of the things that we discovered by way of example is that there are older implementations of OpenSSL and OpenLDAP that did not support chaining to a modern root as expected. Basically, there was a bug, and nobody knew about this bug because they were always just chaining up to an old legacy root and it was all fine. But take away that legacy root and all of the sudden this software that shouldn’t be breaking is breaking because it’s not chaining correctly to the modern root. So, that sort of thing started to happen to various systems in various places over the weekend. On Saturday. Because, of course, unfortunately, this had to happen on a weekend and then as the root expiration occurred, of course we live in a world of very complicated interlocking systems. So, I may have hundreds of different work streams running in my enterprise and if you shoot the wrong one, you take them all down. And so even systems that themselves were perfectly fine in terms of their trust stores and their root chaining were not fine because they were dependent on something else and that something else stopped working.

    So that’s what went on and, you know, a bunch of people showed up on social media saying I have a problem. My things not working. I don’t know why. And other people chimed in and said, you know, I think it’s this root rollover or this root expiration and sometimes it was and sometimes it wasn’t and there were people who wrote blogs about it and there were a few articles at appeared. So, you and I just thought it would be a good idea to kind of lay it all out, what happened, why it happened. Why roots have to expire. Right? And so, you know, that’s basically what went on. This legacy root expired, and I think there was a - - the number of systems that were affected was greater than expected and part of that was these unknown softwares. Right? That was part of it. So, systems that themselves should have been fine, that everybody expected to be fine, had always been fine, suddenly weren’t fine. Part of it was the extreme interconnectedness of things and part of it was just the very large numbers that we are dealing with.

    So, you know, some people had a very bad weekend. Right? And some people got called in and they had systems that were down, and they were working on fixing those systems and a lot of them didn’t understand, I think, that they had this dependency. That they had these legacy systems and one of the things that also comes out of this is, you know, oftentimes, the systems that depend on the legacy roots, those are the oldest ones. Those are the first ones. And as such, they tend to be at the center of everything. You know, if there is the foundational database that everything was build on, guess what? That’s the old stuff. And so, when things went down there was a tendency for them to be services that other services had been built on and ultimately depended on and that also made it worse for the people who were having an issue and it made it worse for them to determine their crypto readiness anyway. Right? Because when you get into these old, old legacy systems - - think about a system you are still running in the enterprise that’s 15 years old. You know? What would be some of the qualities of that system, Jay?

  • Jason Soroko

    Obviously, in a lot of IT systems and enterprises, there’s gonna be systems that have been around a long time. If you walk around a modern data center of a modern enterprise, you are gonna have the slick new blades that do all the on-premises ERPs, a lot of stuff is in the cloud. And then once in a while in the data room you might find an old computer and this thing might have a keyboard that has colorations that are a little bit interesting. What’s even more interesting is what’s on that computer? What’s been installed? I think this is something you and I talked about previously, Tim. A lot of these things are very foundational. Database systems that might have been there from day one that cannot be taken down because everything is built upon them.

  • Tim Callan

    Everything is one of them. Right. They are the ground that everything sits on and sometimes, you know, sometimes everybody is afraid to touch them. Sometimes the people who wrote them don’t work here anymore and nothing was very well documented, and we are not sure we could successfully do a build. Right? We don’t have that environment.

  • Jason Soroko

    You know, Tim, I’ll tell you, we see this a lot more often in OT environments, industrial environments where if it doesn’t have a problem that needs fixing, do not fix it.

  • Tim Callan

    Yeah. Do not touch it. Do not be the one who broke it. Absolutely!

  • Jason Soroko

    Because even a very well-intentioned patch could be the kiss of death and goodness knows what kind of non-deterministic problem that even a very well-considered patch could cause. So, if we were to have some IT practitioners on the podcast with us right now, I’m sure many of them could talk about some foundational system within their enterprise where the system is, they know it’s fragile. They know it’s fragile for this reason but on the other hand, so many things depend on it and it simply has worked for so long that touching it has more potential cost than to do anything.

  • Tim Callan

    Yeah. Is perceived to be higher risk. Touching it is perceived to be higher risk than leaving it be. So, let’s say you ‘ve been assigned to do a crypto readiness exercise for your company coming up on this expiration. You might not scrutinize these systems as thoroughly as the stuff that you understand and is well-documented and is vendor supported. So, those wind up being very risky systems, but they are also in a lot of ways the most damaging systems when they do fail because they are foundational.

  • Jason Soroko

    Yes. And, you know, it’s like causing a failure due to patch. People can potentially lose their jobs over that. But, on the other hand, things such as such as roots than have ten-year expiry dates, right? These kinds of things do need to be swapped out over time. If there’s a root store that might have been working for nearly a decade and it’s never had to have been touched, you know, it could also go out of mind very quickly that there are some things that may need to change on that fundamental system.

  • Tim Callan

    Yeah. Exactly. So, I’m a manager in some department and there is the rack that nobody ever touches that’s got a big sign on it saying do not touch and that was there when I started working here and nobody is entirely sure what it does and it just sits there and as long as the lights are on, we all, you know, say a little rosary and get on with our lives. Right? And so that’s sitting there in the corner and that’s often the stuff where the trouble happened over the weekend. And so, that’s hard. It’s hard. It’s easy to understand how even great IT departments got there and at the same time, it can be really damaging if that’s the system that goes out because it doesn’t have a usable trust chain anymore.

  • Jason Soroko

    You know, Tim, we’ve both been doing this for a lot of years and a lot of people might have wondered what the value in governance was. The extreme amount of governance people did in terms of PKI.

  • Tim Callan

    Yeah.

  • Jason Soroko

    And there was a good reason for a lot of that is because you must document what are the things that I do have to think about through time.

  • Tim Callan

    Yeah. Yeah.

  • Jason Soroko

    The beauty is, when it works, it works fantastic. But, you know, these things that go out of mind, these are the things that get you and this is the theme that you and I have been talking about repeatedly, which is you really cannot set up these things without some form of automation because you cannot store all of those things in your mind.

  • Tim Callan

    Right. And this is a great example of where automation is your friend. Right? Automation is your friend for several reasons. One is it just makes auditing easier. Right? You can just pull up a report and you see what you have. It also makes responding easier. So, if you do find yourself having a bad day, you are able to swap out certs more quickly or do what else you need to do to get the systems back up and running and so both of those reasons, you know, automation definitely is a best practice and it’s a best practice moving forward for sure.

  • Jason Soroko

    Yes. But obviously, of course, we do have systems that were from around the Y2K era and sometimes even well before that. And I give the example of the OT systems that might have been around 20/30 years. You know, financial systems and passport systems that were set up a long, long time ago. This is perhaps a time to review those dusty bits of documentation, find out which roots are out there and just be cognizant of it and maybe put some automation around it.

  • Tim Callan

    Yeah. So, one of the interesting things that I saw. So, there was writing. People were writing about this on their blogs and stuff and obviously, I had a busy weekend. I was ready what people were saying. And, you know, what’s interesting is there as a little bit of dialogue online and on the one hand we had some people who were aghast that a root certificate would be issued that would go for 20 years. They thought that was just ridiculously long and then there are other people who are saying things like why these things expire at all. Right? And so, that just shows the broad set of attitudes that people might have toward these things. And these are IT professionals, right? They are maybe not PKI experts, but these are smart, educated people and, you know, some people not imagining that a root certificate could conceivably be secure in a 20-year timespan, but that is how long they have to last because if they don’t last that long it makes it really difficult for everything else in our systems to work. These are hard things to switch over. As I’ve said, they are very foundational. And, ideally, what is happening is everything is getting switched out over time. Right? So, that AddTrust root was a very old root. Later on, we added a newer legacy root called UserTrust which doesn’t expire until 2028 and then after that we got to our current root, which doesn’t expire until 2038 and you can see how this is going and they kind of leapfrog each other and what it does is it gives the relying parties, the relying systems enough time or the relying parties enough time to get all of those systems updated and swapped out. Right? If you literally got a 20-year lifespan and every 10 years the most contemporary root is the new one, then it eases that pain considerably. The good news is that with more than 700,000 businesses using our certificates around the world, the actual number that was affected was reasonably small, but any number bigger than zero is too many. Right? So, that’s the problem. We’d like to see a system situation where nobody was affected negatively by something like this.

  • Jason Soroko

    Yeah. Exactly right, Tim. So, what are the final outcomes and conclusions here, Tim, that I think are important for people to know?

  • Tim Callan

    I think we talked about, you know, embrace automation, as we always do. Like the certificates and automation that’s just the new world and embrace it. The second thing is you gotta find a way not to take a set it and get forget it attitude toward your fundamental cryptographic decisions. Right? And this is what you get. You’ve got somebody somewhere along the lines set up these systems and then that someone didn’t think about it or maybe that someone was a PKI head but their successor wasn’t and somewhere along the line that just passed out of the institutional memory and you can’t afford for that stuff to pass out of the institutional memory. That’s when these problems happen. People need to be able to say I know - - these are all of my certs. I know where they are. These are all of my CAs. I know where they are. I know where my rules are. I’m following my rules. If I’m not following my rules, I take action to make sure I start following my rules. I know that my rules follow best practices. Right? You mentioned governance earlier. There needs to be governance around this. And that’s how you keep yourself from getting in trouble because roots aren’t going to stop expiring. It’s built into the fundamental architecture of PKI. Every cert expires. Without exception.

    So, you must have a system that can account for that reality and that’s about automation, visibility and governing its policies.

  • Jason Soroko

    And even if that governance is merely just an Excel spreadsheet that’s keeping track, and somebody is keeping track.

  • Tim Callan

    Sure.

  • Jason Soroko

    It must have some minimal level of what are the certificates that I have under my purview. You must have that inventory and if it’s something that can slip, even potentially in ten years and is such a critical system, you are going to have to put some automation around it. You are going to must put governance around it. You are going to must have a paper trail. Don’t let these things go to - -

  • Tim Callan

    Documentation. Yes. Absolutely. Yeah. And that must be part of what your central IT department understands as its job. And, you know, your job is to be ready. It’s like the, you know, the firefighters, their job isn’t just to sit around the firehouse and make spaghetti and do pushups. Their job is to be ready for the time that they all must jump in the truck and go save a building. And that’s your job too. Even when everything is going fine with the crypto, part of your job is to make sure that everything is in good shape so that things will keep going fine and if there is an unexpected event that you can deal with it.

  • Jason Soroko

    Isn’t it interesting, Tim, where, you know, this is not really a technology problem as much as it’s a human nature problem.

  • Tim Callan

    Yeah. And it’s a business process problem and it’s tough. I get it. It’s tough. But that’s why IT professionals make the big bucks. Because they can do tough things. They are smart educated people who are able to make hard things work and this is a good example of that. And you have to hold yourself to that standard.

  • Jason Soroko

    You know, Tim, we are gonna talk about tough hard things in a couple upcoming podcasts. Just to allude to one of them, OpenSSH protocol which, I mean basically anybody who is doing remote access to Linux probably is not just aware of it but is using it on a daily basis. You know, they are deprecating SHA-1 finally - -

  • Tim Callan

    Yeah. Finally.

  • Jason Soroko

    So, what are you gonna do about that? These are the kinds of things where you couldn’t just put your feet up on the desk. Algorithms do in fact deprecate.

  • Tim Callan

    And. And we almost made it through a podcast without talking about quantum computers, Jay, but you had to go and jinx us. And, of course, we all know that everything is gonna have to get swapped out and those algorithms are gonna come, gee, maybe even the first ones this calendar year. By next calendar year and you are gonna want to start using those.

  • Jason Soroko

    Well, if you think about it, Tim - - here’s an example. Maybe just to end the podcast, right. Here’s a thought for you. Just the human mind plays tricks on you with time and because we are talking about timespans of ten years, decades, right, I’m gonna admit to all of the audience here how old I am but I was born 27 years, 28 years after end of World War II and, you know, that doesn’t seem like a long time. 28 years doesn’t seem like a long time. Well, believe it or not, I wrapped up university 28 years ago, Tim. And if you were to ask me, you know, was World War II that far - - I’d say well that’s an ocean of time away but when I finished university that wasn’t that long ago.

  • Tim Callan

    Yeah.

  • Jason Soroko

    Well, no. They were the same amount of time. So, think about this. Right. That AddTrust root which was, you know, started up ten years ago.

  • Tim Callan

    20. 20 years ago. Yes.

  • Jason Soroko

    In less than half of the amount of time that that root was alive, we may have the complete deprecation of RSA and ECC.

  • Tim Callan

    Right. Exactly. Precisely.

  • Jason Soroko

    So, this is where the human nature and the processing we do of time, you can’t let your human folly - - even for me, it’s hard to believe these things because my brain can’t do it but the arithmetic tells me that that’s the truth.

  • Tim Callan

    Yes. And it is. And we will both still be working when that next root expiration that I mentioned earlier comes around. Right? You and I will have jobs and we will be watching in interest to see what happens. And so, that’s the other thing. You might say you set up something. You are ah, ten years, I don’t need to worry about that. Well, if everything goes great and you are really succeeding and your company is wonderful and everything else, it might be your problem in ten years.

  • Jason Soroko

    And, Tim, the industry has moved on enough that, you know, we are not just worried about a bit of cobalt breaking in a Y2K situation in 2000. We now have fully capable systems for automation for roots, for certificates. PKI has moved on and if it’s something that you think you might have a problem with, let’s call in the experts and we can help with that.

  • Tim Callan

    Yeah. I think that’s a great spot to leave it, Jay. So, thank you as always for a stimulating conversation.

  • Jason Soroko

    Thank you, Tim. Yeah.

  • Tim Callan

    And this has been Root Causes.