Redirecting you to
Podcast Jul 14, 2020

Root Causes 106: Massive Intermediate Certificate Distrust Is on the Way

A recently identified and widespread configuration error has created a situation where, with the wrong attack on certain public roots, certificates could become essentially unrevokable. As a consequence, 14 public CAs will have to revoke their OCSP certificates, many of which are also intermediates, and permanently discontinue use of their keys. That leaves millions of active SSL, S/MIME, code signing, and document signing certificates in need of immediate replacement or they will be distrusted. Join our hosts as they explain what the problem is and what messy cleanup will be required to address these problems.

  • Original Broadcast Date: July 14, 2020

Episode Transcript

Lightly edited for flow and brevity.

  • Tim Callan

    Today, we want to talk about a very big piece of news in the public digital certificate community. I’ll say a big piece of news that has been surprisingly absent from the media and the social media that normally cover this kind of thing, you know, very closely, but something extremely important that’s going on.

  • Jason Soroko

    Yeah. I think a lot of the people who cover these things might be on vacation right now because normally they’d be all over it.

  • Tim Callan

    Yeah. You’d think. And what we are doing is we are talking about a rather large compliance and security problem that spans 14 different public CAs that was brought to the public’s attention on the beginning of this month, the beginning of July, that certainly in its scope and reach will affect millions of certificates across many, probably all, of the forms of available public certificate today.

  • Jason Soroko

    That's big.

  • Tim Callan

    It’s big. So, what happened was, on July 1, the engineer who is in charge of Google’s root store program, a guy by the name of Ryan Sleevi, published a bug on the Mozilla board, which is where these things are traditionally are discussed in which he had identified - - again, 14 CAs that had incorrectly configured OCSP responder certificates. Now, we have talked about OCSP in the past. This is one of the revocations checking mechanisms that we have and it’s far and away the most robust and most used and most useful of our revocation checking mechanisms and in addition to that we also discovered revocation, how it works and what matters about this. But basically, due to this configuration error, without getting into all of the gory details, it is possible for a bad actor to essentially spoof an OCSP responder. And what that would mean, if I’m spoofing an OCSP responder is I could send back responses that said that we were all clear for OCSP even if we weren’t. Even if the real responder would have rejected the query. And as a consequence, essentially what this does is this renders, or potentially renders I should say, a certificate unrevocable.

  • Jason Soroko

    Yeah. I think that final point was the main point that was being made and that was serious enough when that news broke.

  • Tim Callan

    Right. So, where it all starts, and we are gonna walk through this because this has developed in stages. Where it all started was that Ryan Sleevi posted this bug that essentially says these certs, you know, in the right attack, right and an attacker would have to be able to do several things. They’d have to get your key. They’d have to probably compromise your DNS. They send you off to the wrong place and then when you get there even if someone knows that this attack is going on where you are being tricked or fooled or something, if they revoke the cert on this scenario they could basically shut that down and anybody with that poisoned DNS would just go right on getting the wrong responses regardless of what the actual certificate holder or the CA tried to do about it. So that’s terrible right? That’s a very bad outcome cause revocation as it is is just about our last line of defense. Our last line of defense is distrust of the root. So, gee, not being able to revoke certs is a bad situation to be in because the next step is to just plain stop trusting the root at all and so this was identified again by Mr. Sleevi as two things. Number one, a compliance, and unambiguous compliance error by how these certs were to be configured was clear and unambiguous in the guidelines and they had not been done that way. But, more importantly, in his mind according to his words on the public forums is that it was an unacceptable security risk because of the high impact of an actual incident. Like there’s no evidence to suggest that such an incident has occurred but if an incident were to occur – now if it had occurred we might not know about it but if such an incident were to occur then the impact would be just gigantic and so, you know, Sleevi was clear, which is, it is a compliance failure and therefore you must follow the compliance guidelines and I’m gonna hold you to that but the reason I’m being such a stickler is because this is an unacceptable security problem. So, let’s pause there and make some comments, Jay.

  • Jason Soroko

    Yeah. Sure, Tim. So, I’m thinking about that. I think it does make sense. You know, it’s very clear what the bad guy has to accomplish. It’s not trivial. This isn’t a - -

  • Tim Callan

    It’s not trivial.

  • Jason Soroko

    The U.S. cert team does a pretty good job at a lot of times saying, hey look, you know, this is so impossibly difficult. It’s Tom Cruise Mission Impossible kind of stuff. Or my goodness, a script kitty could, you know, probably already has ten scripts for how to do this. This is probably at the higher end.

  • Tim Callan

    This is an advanced attack. For sure.

  • Jason Soroko

    Yeah. But on the other hand.

  • Tim Callan

    But an advanced attack with potential consequences that are enormous.

  • Jason Soroko

    But not impossible. But not impossible. And the problem of course being the outcome if the attacker was able to jump through all those hoops, then the outcome is just unacceptable.

  • Tim Callan

    Yeah. The outcome is enormous. So that was the baseline. That’s where it all started. This is July 1. So, the Wednesday before the Fourth of July weekend. Shortly thereafter, possibly still on the evening of July 1, we get the full list of affected CAs, again from Mr. Sleevi. It’s 14 public CAs. Shortly after that, various people using the CRT.SH tool, which was created by Sectigo’s own Rob Stradling and is maintained by Rob, managed to identify several interesting things. The first of which is that many of these certificates which were used to sign OCSP responders were also being used for other things and the big thing among those other things that they were being used for was to sign intermediates and those intermediates that were being signed with these certificates in their turn were signing very large numbers, we’re talking millions of leave certificates – end-user certificates. And not just TLS certificates but all forms of public certs. There were code signing certs. There were S/MIME certs. There were document signing certs in there.

  • Jason Soroko

    That’s the part that really surprised me, Tim, was just the wide-ranging number of certificates that were implicated here. It ended up being a lot of certificates.

  • Tim Callan

    Yeah. It something like 200, more than 250. 275-ish intermediates across these 14 CAs were identified as being essentially no good. Now, another key part of this that we haven’t even gotten to yet is because these certificates are essentially unrevocable what that means is it means the practical consequence of it is that the keys cannot be reused. So normally when there’s a certificate problem what you can do under most circumstances is you can issue a new certificate using the same set of keys and therefore you can swap in the new cert and nothing else needs to change. So, for instance, if somebody had a typical compliance problem with an intermediate, they would have a certain amount of time to solve that by changing out the intermediate. They could change out the intermediate and the leave certificates that had deployed on machines all around the world would not need to change. They would continue to work. However, due to the nature of this problem, the only way to mitigate it is to cease using that key pair and therefore one of the things that Ryan Sleevi is going to require from the CAs that had this problem is that there will be what is called documented key destruction. I think it was document, verified key destruction. Something like that. I could look it up. But basically, he wants credible evidence that he can believe that not only have the certs been revoked, which is easy to check, but that the keys cannot be used again and that means that the intermediates need to be signed. The new intermediates need to be signed with new keys, which means that the leave certificates will stop working. They will not roll up the trust chain correctly and what that means is that all of these leave certificates, these millions and millions of certificates that are out there on systems all over the world – the CAs don’t even necessarily know where or how they’ve been deployed. Those certificates will fail when these intermediates are revoked.

  • Jason Soroko

    Yeah, Tim. That is something else. When you are having to – just the thought of some of these big intermediate CAs having to be not just them themselves taken offline but all the leaves associated with them. You are talking about very large; you know, I’m thinking right now in the movies when large chunks of a large city the lights go out one at a time.

  • Tim Callan

    Yeah.

  • Jason Soroko

    It’s almost that kind of feeling.

  • Tim Callan

    Yeah. It is that kind of feeling and so, not surprisingly, there was a decent amount of initial pushback from CAs saying, no, this is extreme; no, we can’t do this. Think about the end users that are gonna suffer and, you know, they started to pull out the normal these are critical systems and government sites and transportation systems and cyber defense and all of these things that’ll go down and the consequences of these things going down will be vast and sweeping and here again I think Ryan Sleevi is pretty clear on his position and this is where it’s good to understand how the CA/Browser Forum works. There are CA/Browser Forum rules and if you do not follow the rules that is a non-compliance incident and so Ryan is clear that anybody who does not revoke the intermediates in the specified timeframe are having a non-compliance incident and he is not flexible on that and there are certain rules about what you need to do when you have a non-compliance incident. Now at the same time a non-compliance incident does not carry with it an automatic death penalty because there is recognition of the fact that circumstances might occur where non-compliance is the best choice or the only choice. However, it is considered to be a meaningful failure on the CA’s part, and it is noticed as such and part of the reasons that distrust occurs – I’m thinking about the Symantec case. I’m thinking about the Certinomis case is for too much non-compliance. So a non-compliance incident definitely really, really matters but it isn’t necessarily a death penalty. So, you know, Ryan has been clear that everybody who doesn’t have their intermediates revoked in the specified time will have a non-compliance incident. Now – are you ready for the kicker, Jay. You want to know what the specified period is according to the guidelines?

  • Jason Soroko

    I am waiting for that. That’s great.

  • Tim Callan

    It’s seven days.

  • Jason Soroko

    Seven days?

  • Tim Callan

    So, on July 8. On July 8 – so these people in principle had seven days to swap out millions of certificates to revoke these intermediates or revoke the intermediates and cause people to go offline. This is impossible right? You can’t get people to change out their cert under the best of circumstances. Let alone when you are suddenly surprising them. Let alone when you are suddenly surprising the day before the long Fourth of July weekend. The day after Canada day if you happen to live north of that border and at a time when huge numbers of people are going on vacation especially since we’ve all been in this weird COVID-19 situation and there’s some amount of reopening and a lot of people are choosing that time to take their vacations. It was just a rough, rough, rough time for people to be dealing with this. So, not all of these roots were revoked. Some of them were. It looks to me, my eyes, like most of them – - well, more than half of them were not and of course, this is an issue. Number one it’s the compliance issue but number two, there is this underlying security concern.

  • Jason Soroko

    Yeah. So, Tim, I’m trying to think now, at the top of the podcast you mentioned 14 CAs affected. That’s not all of them. Right?

  • Tim Callan

    No.

  • Jason Soroko

    There’s a lot more CAs than that.

  • Tim Callan

    There’s like more than 100 in the Microsoft root store. So yes.

  • Jason Soroko

    So, 14 is a limit - - you know, 14 is a finite number within the CA world.

  • Tim Callan

    Yeah. But some of them are very, very big to be clear.

  • Jason Soroko

    Right, right, right. But was it - - I’m assuming it wasn’t all the big ones?

  • Tim Callan

    Well, it wasn’t us. If that’s where you are going with that. So, to be clear - - but you know what, that’s a good point, Jay. We should make that. If you are listening to this podcast and you are a Sectigo customer and you are thinking Oh God, I gotta go do something about my certs, you don’t. Your Sectigo certs are fine. So, let’s just get that clear so everybody knows that. It wasn’t - - well, how to put this. There are four CAs that between the bunch of them are doing more than 90% of the global volume and two of those four CAs were affected.

  • Jason Soroko

    That’s interesting, Tim.

  • Tim Callan

    So big numbers.

  • Jason Soroko

    Yeah. Sometimes when you hear about these things, I mean we’ve heard all kinds of wobbles in the CA industry and a lot of times it’s really some small players who didn’t quite know what they were doing. And we’ve seen those kinds of problems before but that doesn’t sound like it’s characterized like that. It does include some of the larger ones.

  • Tim Callan

    Yeah. It includes people who had access to the information they needed to know how to do this, had the technical acumen or needed to have the technical acumen to know how to do this and who have been issuing certs since well before that rule was put in place. Who have been issuing certs since well before OCSP was a thing. Before OCSP was real. So, if that’s where you are going with this question, Jay, that’s the answer.

  • Jason Soroko

    Yeah, because that leads me really to what my real question is because this podcast is not about a who, it’s more of a how than a who question that we are trying to solve here.

  • Tim Callan

    Yeah.

  • Jason Soroko

    The reality though, was this a solvable, you know, something, the insight from the past might have led you to not having this mistake. In other words, is this avoidable? Was this an avoidable problem?

  • Tim Callan

    Is this an avoidable problem? Yeah. And yes. The answer really is yes. There is no other way to put it cause for starters lots and lots of CAs did avoid it and second, it was right there in the guidelines. It was in the guidelines black and white. It was in the guidelines since version one and public CAs are responsible for following the guidelines. Now I’m sympathetic to these guys. I’m reminded of the 63-bit entropy problem that you and discussed last year when it occurred. Similar thing.

  • Jason Soroko

    That’s exactly what comes to mind, Tim.

  • Tim Callan

    Yeah, and lots of certs had to be revoked and replaced cause in that case the entropy problem was in the certs. The certs themselves had to change, right. It wasn’t this root problem. But that at least had - - what that had going for it in terms of sort of people giving themselves a break was that it was a popular tool that was widely used that was not behaving the way that it was said to behave and it was just a function of people not really looking too closely at it to realize that one of the digits was always the same and you know at least you understand how that comes about. This is a different situation. This was just it seems ultimately this was somewhere along the line someone making a technical or configuration decision who didn’t understand all the parameters that needed to follow. Pure and simple. So now let me throw one more wrinkle. In this incredibly wrinkly situation we have, I’m gonna throw one more wrinkle into it which is I mentioned early on that this affects more than just TLS certificates. So in the world of TLS we’re used to revocation and replacing. That’s what OCSP is for. That’s what CRL are for. Those are SSL mechanisms. But in this case, these intermediates that must be replaced have been used to sign other kinds of certificates including S/MIME and there isn’t a mechanism for an, let’s say an email system to check if an S/MIME certificate has been revoked. So, there isn’t an equivalent of OCSP for S/MIME. So, these intermediates are gonna stop being trusted. When the intermediates are stopped being trusted the OSs, right. Your Apple, your Windows, your Linux will stop trusting the root. It will stop trusting the S/MIME cert and there is no way in that whole system to say this certificate has been revoked. So in the case of the S/MIME certs people have to somehow reach out to whoever is using the S/MIME certs, wherever they are, and say you have to uninstall this S/MIME cert from your email client and install a new S/MIME cert with the new intermediate from an email client so that is just a whole new nightmare that none of us has been anticipating or expecting that is going to be - - I mean that’s arguably gonna be the worst set of certificates of all to get replaced and people are gonna be out signing their mail, thinking they’re signing their mail with certs that aren’t trusted and then individual client systems will fail in different ways depending on how they fail under those circumstances and it’s just gonna be ugly for those people.

  • Jason Soroko

    Geez, Tim, I’m thinking now of the implications even for email storage.

  • Tim Callan

    Sure.

  • Jason Soroko

    With S/MIME certificates that have essentially become untrustable.

  • Tim Callan

    What a great point. I didn’t even think about that, Jay. That’s an interesting question and I don’t know the answer. Do you? But if the root becomes untrusted, can I access my old stored emails still?

  • Jason Soroko

    I think technically yes.

  • Tim Callan

    Ok.

  • Jason Soroko

    Simply because there is no check. If the check mechanism isn’t happening - - in other words, if you essentially have the key to the door, the door will open.

  • Tim Callan

    Right.

  • Jason Soroko

    If the door accepts that key and from my understanding you can probably configure it to open the door if you possess the correct key.

  • Tim Callan

    So, let me make sure I’m asking this question right because I really am ignorant about this. This is an interesting one. So, I sent an email a month ago when these roots were trusted, and it got stored locally and it’s stored encrypted because I sent it with an S/MIME cert and I can open it up or someone else can open it up. Let’s talk about a key vaulting scenario. Right. My keys have been vaulted so someone else can open it up. Let’s say if I lose my laptop. Now the intermediate gets distrusted, that old S/MIME cert that was signing these emails no longer has a trusted chain up to a trusted root. Now someone goes in and needs to open my archived email later in the future using their key vaulting software and their vaulted key, does that decrypt the stored email?

  • Jason Soroko

    I think it’s more of a matter of possession. If you possess it, then you can -

  • Tim Callan

    Right. If you have the old key, you still have the key.

  • Jason Soroko

    So, in the other words, the configuration probably is not hard set to say, do not open this email if it’s been revoked.

  • Tim Callan

    Right. Cause you have the key, so you have the key. Ok. So that’s a little tiny bit of good news in this whole giant mess. So, I never even thought that through. I’m glad we discussed it. So anyway, this is developing in real time. At one point one of the major CAs said that it would take them I think it was 8 or 10 months to get all these things revoked. Now other people blew up at that. So they subsequently published a message saying we are working on bringing that number down but it’s just a mess and it’s going to be a mess for a long time and there’s going to be fall out and a lot of people are going to have trouble, they’re gonna have projects or they’re gonna have a really bad day when things stop working and this is going to be a big impact on a lot of individuals for a long time and it’s going to be hard to ever truly document and characterize exactly everything that goes on with that. So, there you are.

  • Jason Soroko

    That’s interesting Tim. I’m glad we were able to talk this through. I think that the nuance of the - - a couple things I think were interesting to me here in terms of Ryan Sleevi’s assessment of the risk from a security standpoint I think holds. And I think also from PKI, you know, people who are interested in the topic, especially in public trust, the fact that these intermediate CAs having to be distrusted at the root level is, you know, this is fundamentally the big problem here and the fact that we are now dealing with types of certificates that are not just TLS, this is really something. It’s a big issue.

  • Tim Callan

    This is a first in the world of public PKA. Like there has never been an episode and yeah, I think back to some of these other major episodes. The 63-bit entropy problem, the Symantec root distrust, various other things that went on, huge and important but this one has some unique characteristics that I dare say we never thought we were gonna see in the real world. And yet here we are.

  • Jason Soroko

    Middle of the summer 2020. Things never seem to get dull.

  • Tim Callan

    2020 is going to throw you lots of curve balls. There’s no question about that. So, I think we will probably return to this story sometime in the future when there have been a lot more consequences have shaken out. I don’t think we’re gonna do a weekly update because it’s all just gonna be - - it needs a little time to cook, but I think we’ll be returning to this.

  • Jason Soroko

    And Tim, I think we have a podcast on the root store program.

  • Tim Callan

    Yeah, I think you are right. I think explaining root stores, how they work, why they do what they do is a smart future podcast because that’s a lot of the context that also makes this make more sense. There is a CA/Browser Forum element to this which we’ve discussed but there is also a Mozilla and Google and potentially other root stores aspect to this as well that’s very important. So that’s a good topic to return to in the future, Jay.

  • Jason Soroko

    Thanks, Tim. I kind of enjoy these here’s a big item in the news and then here’s a future podcast that explains what the heck the components are behind that news.

  • Tim Callan

    Right. We always have lots of future podcasts. We have no shortage of things to discuss but maybe a good place to leave it today and, again, probably in some months we’ll return to this.

  • Jason Soroko

    Thanks a lot, Tim.

  • Tim Callan

    Alright. Thank you, Jay. Thank you, listeners. This has been Root Causes.