Root Causes 128: What Is Total Certificate Agility?
First we had crypto agility, which is how we ensure our cryptography stays current with the needs of security. Expanding on this concept, industry leaders are now looking at certificate agility, which is building our systems so that all certificates are known, current, and immediately replaceable. Our hosts explain certificate agility, why it's important, and what you need to do to achieve it.
- Original Broadcast Date: November 12, 2020
Episode Transcript
Lightly edited for flow and brevity.
-
Tim Callan
So, today, we want to talk about a new industry phrase, which is total certificate agility.
-
Jason Soroko
Total certificate agility. I’ve heard of crypto agility. So, what’s this about?
-
Tim Callan
So crypto agility, it’s actually crypto agility was the foundation stone for this new concept. The basic idea behind crypto agility is that you must have cryptographic systems that allow you to make the changes that need to be made in real time in order to keep your system secure. One of the big elements of that where that cryptography gets kind of locked in is these certificates. Lots and lots of certificates. Different form factors, different places, different durations and the number of certificates that are out there in the world and are in a normal large enterprise can be really vast and some of these certificates can hang around for a long time and even if you think of, you know, public TLS certs are down to one year now, that’s still a year is a long time and there are other kind of certs, you know, if you have private CA certs or if you have other certs that are, you know, other kinds of certs like code signing certs, S/MIME certs, these things can float around for years. And so, the idea behind certificate agility is it turns out pragmatically in the real world that sometimes these certificates have to change. And your typical enterprise has lots and lots of certificates and there are all kinds of different form factors and durations, and they show up in different places in your organization. Lots of different environments and so building on that foundation stone of the idea of crypto agility, you need to have the same agility for your certificates. There may be reasons why these certificates need to be changed and under those circumstances it can be difficult for companies to do that. Sometimes certificates have to be manually updated. Sometimes they are not even known. Right? Sometimes they don’t know exactly what certificates they have or don’t have. They don’t have a single repository of information. They don’t have a single pane of glass. And so, under those circumstances, companies get caught flat-footed. Companies have bad certificates or certificates that must be revoked, and they have outages because they are not able to adjust to that and deal with that in a timely fashion. Or, again, they are not even aware of these certificates until the outage occurs. And so, a concept that we are trying to promote is take the idea of crypto agility and extend it to certificate agility.
-
Jason Soroko
Alright, Tim. So, you know, I’m thinking of the IoT realm. And I’m thinking of the movement in IoT with certificates to shorter lifespans, which necessitated better certificate management provisioning for those kinds of devices and it almost sounds like even though you have systems such as web servers which do not have the same kind of constraints that an IoT device does, you know, web servers load balancers, etc. those are the kinds of things that if the process of swapping out certificates was manual and something happened to those certificates, the level of effort to quickly respond to that, especially on a mass scale, is quite difficult.
-
Tim Callan
Yeah.
-
Jason Soroko
Meaning that, you know, it’s the same thing as an IoT where you are gonna have high scale, difficult to get to, difficult to swap out certificate environments. We don’t typically think of web servers and load balancers that way but it’s true.
-
Tim Callan
Yeah. IoT is a great example of kind of your ultimate low agility scenario often. Right? These are many times the fire and forget devices where nobody has built in any kind of agility at all, if they even have a certificate in the first place, as you and I have discussed in the past, and then there is probably no mechanism for upgrading that certificate or updating that certificate. But even in the world of things we think that should be very agile, like web servers, we are running into this problem all the time where people say, oh, you know, I have 500 certs and they are sitting somewhere on servers and someone’s gotta touch each of them and now we don’t feel like we trust our keys anymore or we reuse the same cert across 500 servers and now somebody has gotta go do it and there is not enough time to get it done.
So, this happens in all kinds of different places and all kinds of different ways and many times it’s avoidable. Right? It’s avoidable primarily through two mechanisms. One of which is automation. You and I love to talk about automation, but this is part of the reason why. And the other one is just visibility. That’s the other thing we talk about a lot, which is knowing what you have and being able to mange them in an automated way that doesn’t involve a human going and installing a file on 233 separate machines.
-
Jason Soroko
Yeah, and that visibility also allows you to perform governance as well, which is if you have a certain set of policies that you want to have in place such as, you know, there is a specific certificate profile that I wanted to use. I only want to use ECC vs. RSA, whatever. Having that kind of profile allows you to do that. You and I both have had podcast talks about other things such as the other forms of certificate profiles in public trusts that are very important that most people just ignore but should never ignored and you always need to have at least an eye on it, an automated system looking at it. So, visibility is quite a wide topic and it’s very, very important. I agree, Tim.
-
Tim Callan
Visibility is hugely important. We probably don’t talk about it enough. It might be worth our while to get down in the details on that somewhere along the line. But, so, those two kinds of as the foundation stones create your agile certificate system.
Then the last word, first word total, is that’s kind of the ideal. So, obviously, you could have a spectrum of certificate agility inside your organization and a great thing to aspire to is total certificate agility and an easy working definition of total certificate agility is that every single certificate in your environment is identified and is replaceable within a certain short, specified timeframe. Let’s say 24 hours or 5 days or a week. And in that timeframe that you could swap out 100% of your certs if, heavy forbid, that is what had to happen. And it’s not that anyone is expecting that to happen but if you hold that up as the ideal, then you find that the normal one-a-day problems that do occur are easy-peasy.
-
Jason Soroko
You know, Tim, I’m just trying to think of the implications of this total certificate agility. And, of course, you and I have both talked quite a lot about the trend towards shorter certificate lifespans overall in both public and private trust and what I’m thinking is that’s a good trend from a security standpoint. We agree with it but what would be a pain would be the need for ultra, ultra-short certificate life cycles.
-
Tim Callan
Yeah.
-
Jason Soroko
And I think that total certificate agility is what allows us to have that flexibility, which is in any use case, even if it’s in public trust, you might say to yourself, well, you know, maybe I want less than 90 days. But that’s, it’s just such a pain because I know I have to renew even though there may be, you know, the majority of 90 day cycles in my lifespan overall, I’ll never had to do anything because there’s no security issue. But it’s that one 90-day quarter that’s gonna get you. And, so, total certificate agility allows you to have a reasonable 90-day, one-year public SSL and if something ever comes up, you can very, very quickly swap out. And that’s a handy thing to have.
-
Tim Callan
Yeah. Absolutely. And so, I think you are right on the money with that. So, go with a little thought experiment with me here. Let’s say that our threshold was five days. Five days is a big one in the world of public SSL certificates. So that’s why it’s a good one to pick. Let’s say our threshold is five days and we said we need 100% of our public SSL certs to be able to be swapped out within five days. Well, I’ll you one way to do that is issue five-day certs. Right? And under those circumstances you wouldn’t need any agility at all because they’d all be done within five days. But, of course, at that point you need to be in a pure automation environment anyway and you haven’t really accomplished anything. So, when you start to say, ok, I’m going to get that automation in place and get, again, that inventory of my certificates in place, the consequence of that is you can be safe using 90-day certs or one-years certs for that matter and it doesn’t matter because you know if the button must be pushed, the button can be pushed.
-
Jason Soroko
You know, Tim, there is also a question we could ask is how did we get here and why is this needed so badly? And, you know, I could be accused on this podcast of speaking quite harshly towards the good old Linux administrator in an enterprise IT environment which is doing a yeoman’s job but really, I’ve worked with a lot of these folks, and I’ve even been that guy in the past in various other capacities. And one of the big mistakes that I see that culture of people make is look I want complete control and I’m competent and I know how to do this. So, I’m gonna repeat this manual step over and repeatedly as it’s needed.
-
Tim Callan
Sure.
-
Jason Soroko
And I’ll just keep track of it. I’ve got this handy-dandy spreadsheet and when time comes up, I’m good. But that person also forgets that they might go on vacation, they might be furloughed, they might - - I mean COVID has taught us a lot of things.
-
Tim Callan
They might get the flu. They might have error. Yeah. This is the same reason that people who are white-knuckle passengers are perfectly calm when they are the one who is driving the car. It’s this elusion of control when you have your hands on the levers. Even if it turns out that the other person who might have been piloting the car is more competent than you are and so you have a similar situation here. People want to have their hands on these levers, but they are actually worse at it than competently written software is.
-
Jason Soroko
Tim, you know, a biggest part of the problem here is the average Linux administrator does not appreciate the cost of their own time the same way that a CEO would. Or a CFO. And they also do not appreciate the fact that they can’t scale the way that well-written software can and that’s something that’s on the mind of a risk officer, a CIO, or a CSO.
-
Tim Callan
Yeah, and now you are getting to the other factor in all of this, which is human psychology. Right? You’ve got this person who has been doing this for their entire career and maybe there are a little risk averse. Maybe they are afraid that if this part of their job goes away that they will be deemed expendable.
Now the irony, of course, is there is this massive IT skills gap and most of the IT professionals that I know would rather learn new things and get better and use new technology and give themselves new challenges rather than continuing to do repetitive cookbook work that could be done by software instead. But, then on the other side of that, you have this but I’m the only person here who knows how to do this and I’m the only person who knows where all the certs are, and I know I’ll never be laid off. And those two things I think do compete in individual’s minds when they are making these decisions.
-
Jason Soroko
And, Tim, exactly what you just talked about, that’s the reason why cybersecurity in general needs to be top-down decision making. So, in any organization these are complex topics. Right? As much as we like to try to make things simple in our podcast and break it down, there is still an incredible amount of complexity and so if you are an executive within a company you want to push that off to people who are experts. You know that guy in the back room who is the Linux administrator, well, that guy understands all this stuff, so you are gonna put the burden of the responsibility on that person. But if you are a leader, you also need to understand that that person has motivations that might not be totally aligned with your CIO, your CSO, and other people at the executive table. This needs to be a top-down decision where total certificate agility should not be left to the people in the trenches. The decision to move forward with that really needs to be a top-down decision as far as I’m concerned.
-
Tim Callan
Yeah. This could be a mandate. This could be a department objective or a team objective and that’s part of the way that you make this really happen. And, you know, let me make one last point. I like to push this concept of total certificate agility but, again, let’s remember that’s sort of the Holy Grail and this is a perfect example as so often happens in security that we can’t let perfect be the enemy of good. Right? If you can improve your company’s certificate agility, you are going to be better off. And if you improve that incrementally in stages and you approach totality even if you never actually get all the way to 100% agility, you are making things safer. You are reducing risk. You are making things more efficient. You are doing your coworkers and your customers and your employers a favor.
-
Jason Soroko
Well, write it down, folks. Total certificate agility. I think Tim just coined a term.
-
Tim Callan
I think so. So, thank you very much, Jay.
-
Jason Soroko
Thank you, Tim.
-
Tim Callan
Thank you, Listeners. This has been Root Causes.