Root Causes 16: PKI for DevOps Environments
DevOps as a software development and deployment methodology has radically transformed enterprise computing. This approach brings with it new architectures and tools such as containerization, Kubernetes, and multi-cloud. Learn how PKI plays a critical role in DevOps environments and how enterprises can best use certificates to keep their platforms safe.
- Original Broadcast Date: May 14, 2019
Episode Transcript
Lightly edited for flow and brevity.
-
Tim Callan
Today we’re talking about DevOps.
-
Jason Soroko
DevOps, the hot topic at the moment. When I’m speaking to people about the topic of DevOps and the underlying technologies that are commonly used, especially from a security standpoint, I’m finding that the storyline of how we got here might have gotten lost somewhere.
-
Tim Callan
It’s interesting, DevOps isn’t really a technology. DevOps is a method, a practice.
-
Jason Soroko
Yeah, a set of principles. Just the terminology, those two words that are kind of slammed together. Dev. Ops. It really is all about better collaboration between Developers and the Operations team that usually has to deal with the software that is the output of those developers.
-
Tim Callan
Right, and about agility and it’s ultimately interwoven with a bunch of computing platform technologies that are transforming or have already transformed the modern IT infrastructure. But trust and security have to be approached a different way once you’re using the DevOps principles, right?
-
Jason Soroko
Certainly. There definitely are a number of technologies. I think in this podcast we’re going to be talking about one of them. But even before we get there, let’s go back to that goal for a second, between the developers and the operations teams.
Probably one of the main goals here with DevOps is trying to create a fast and stable workflow between the two groups because right now there’s a lot of silo between the two and the way things had been traditionally done, you know with waterfall methods of software development and old ways of doing things, that maybe kind of worked fine in the old monolithic coding days. But today with some of the burgeoning technologies such as cloud, how to get your applications up to the cloud quicker, with less bugs? How do you get a faster rate of actually going from the point of conceiving something, getting it out there and live, and then being able to rapidly iterate it and change it? The people who run operations and the people who run development are going to need to hold hands a lot tighter.
There are maybe four principles to get through this idea. One of the main ones here, is this idea of everything as code. Probably one of the biggest problems with the amount of collaboration between developers and operations is just the sheer amount of manual work that had to happen.
In other words, ask yourself this simple question. If I gave you an operations person who is monitoring a cloud service that you just built, and a developer who is maybe building patches, what are the chances on any given Monday morning that those two people will have identically the same system if they both worked manually to build the Linux distribution and all the dependencies and all the things that make up a computing system to make it go?
The chances are pretty slim, right?
-
Tim Callan
Yeah.
-
Jason Soroko
So therefore, now even back in the traditional days, how many times have you ever heard the developer give the excuse to the operations guy, “Well this thing works on my computer.”
-
Tim Callan
More times than I can count. That’s why we have QA, right? Every developer always confirms that it runs and it always runs on their system, but clearly that isn’t the whole universe.
-
Jason Soroko
If you think about where we’ve gone from various other kinds of technology such as virtualization ,which was a way of basically bringing together whatever application you were running within a specific operating system, and then having a lot of control over it. That may have been fine in a more monolithic way of thinking, and it still works today for a lot of functions, but for the most part, a lot of code now is being written in discreet bits and those discreet bits don’t need the entire weight of an entire operating system.
So there are lighter weight forms of that. We’ll get into that technology in a moment. That’s containerization, and we’re going to talk about the security of containerization a little bit later in the podcast. But let’s talk about this whole idea of DevOps again, and maybe four ideas behind this.
Everything as code: a really important concept within DevOps. Remember the scenario I gave you where you have two different people both trying to build their own infrastructure to be able to match each other. The chances that every single configuration within those things is going to be identical is pretty slim. Therefore, the entire infrastructure should be codified in some sort of declarative specification.
This means that standing up infrastructure really shouldn’t be done by hand. It should be codified for consistency into a type of template that can be repeated often. Because there’s lots and lots of tools for that now. You’ve probably heard of Chef and Puppet, and they all have their strengths, they all have their reasons for why you’d use one or the other. But this idea of codifying things to stand up, especially because you’re constantly going to be bringing up a cloud infrastructure, you’re constantly going to be bringing it down and you want the ability for those things to happen very, very consistently and have the results be the same all the time.
-
Tim Callan
Not to dumb it down too far, but what you’re saying is you want your code to be consistently running in the same environment every time.
-
Jason Soroko
You at least want to get to the point where your code is actually running on the exact same platform. For those of you that come from a pure Microsoft stack world, Windows itself has all kinds of dependencies as part of its platform and Windows is great for that. It’s why stuff is still built on Windows server to this day and probably will for many years to come.
I think with the cloud though, you now have all kinds of different Linux distributions and the definition of a distribution is that every single Linux distribution is bundled with a different set of code, a different set of dependencies, a different stack of this and that, all the way from the GUI on down to some of the nuts and bolts.
So in terms of the “everything as code” idea, keep in mind that not only do you want to have this declarative specification for how to very easily have a consistent platform all the time, but you also want to, as always, have your source code that is basically controlled and something like GIT. For the same reasons you might have done before the DevOps days, you also want to do your code that’s tested in some sort of a quality assurance program, some sort of a pipeline process to make sure that it passes muster.
But there’s another, probably newer idea that we’ll call immutability. Tim, remember when I said for your infrastructure perhaps in the past you set up a Windows server, and that thing probably stayed up for a very long time, and you just made changes to it kind of by hand. You might have had development pass over patches to the operations team and the operations team might have applied those patches as time went on but the server never really changed. I think the idea of immutability, there’s probably several ways to define it, but I really like the idea that infrastructure should be considered disposable.
What you gain by that is that it avoids the infrastructure being patched to some level that’s inconsistent with another. In other words, your Q&A systems, your developer systems, and the systems that might be used as some sort of a test server by operations should all be pretty much identical, and all of them should be considered disposable. You should never just have one server that lives forever and it’s kind of the de facto gold standard that everybody needs figure out how to match. Everything is immutable.
-
Tim Callan
This is a decoupling, a detachment, of the actual hardware from the software processes, from the workstreams.
-
Jason Soroko
One hundred percent. The developer no longer has to do an enormous amount of thinking. In fact, they’re discouraged from doing a lot of thinking about the server.
-
Tim Callan
I need to know a few things. I need to know how much compute I have and how much memory is available and how much storage is available, and other than that I do not need to worry about any of that other stuff.
-
Jason Soroko
You got it.
Let’s talk about one of the underlying technologies that’s really helping out and it’s having a real renaissance right now because of how important it is.
We talked about virtualization a little earlier, and obviously that was a way of taking monolithic pieces of software and running them in isolated, virtual machines. And that’s been fantastic. It still works to this day. People use it to this day, all kinds of usages for it, but what happens if you no longer have a problem standing up all kinds of small servers in your cloud environment, for example, or even in your own private rack space?
One problem you still have though is, you have all these different distributions of Linux out there and you don’t want your code to be distributed in such a way where the operations people have to worry about the dependencies. Software obviously has its dependencies and the servers have all kinds of different starting points. You want to be able from just about any starting point to get up and running and have the discrete bit of logic just do its thing.
This is where the concept of containerization comes from. Now most people think that containerization is a form of virtualizing. I think that’s where a lot of people get into trouble. Because it’s not a VM. A container is really about bundling a discrete piece of logic, its code essentially, along with its dependencies. That’s probably one of the most important concepts you can understand, if you really want to understand containerization.
The question then becomes, “Well why do I need it? We already have virtualization.” Virtualization is isolating an entire operating system, each instance hosted with a hypervisor. Hosters are actually run within a container engine, posting engines. You might have heard of Kubernetes, which is actually I think is derived from the Greek word for orchestration, which we’ll get into in a moment. But keep in mind that containers are much more lightweight than VMs and much less isolated from the underlying operating system.
If you want to understand containers from the highest level, they’re really lightweight ways of just bundling together code, along with its dependencies, and containers really should not be thought about like a VM because of the less amount of isolation you have from the underlying operating system.
-
Tim Callan
Okay, and are you going to get into the significance of this less isolation as we move forward? It seems like a pretty important point.
-
Jason Soroko
It’s probably the most important point from the standpoint of security. Because of the fact that when you’re running things within a VM, a virtual machine, you can definitely count on the hypervisor itself providing some level of isolation in memory space between the virtualized operating system and the underlying operating system internals.
Obviously, there’s problems. Check out any of your favorite hacker conferences and there will be examples of people finding holes in various hypervisors. But suffice to say that jumping out of a hypervisor is not something that the average script kiddie is going to do on a Saturday afternoon.
But the problem is, think about all these different discrete pieces of code, which have their own interconnects. They connect to a database. They have their own user connections, human based authentication. One container might call another, not just within its own cloud but to another cloud. The multi-cloud containers. Any time you’re doing any kind of reaching out and touching of anything, you’re traversing network boundaries that are no longer as clear and secure as you might have remembered back in the old monolithic days.
-
Tim Callan
We might be talking about a lot of containers, right Jay? If you break down your complicated enterprise work streams into these little discrete bits, and each of these discrete bits is running as its own container, you can have a very large number of these, and you can have a very complex and dynamic environment that you’re dealing with.
-
Jason Soroko
For those of you that come from a certain day and age, when you remembered writing a discrete function, a subroutine that did one specific task, that one specific task might be containerized today and therefore completely isolated out of a code stream into its own cloud hosted container. The cloud worries about balancing the compute time and the database power and everything else that might need to be going on behind that to make sure performance is at the right level, but to your point, yes.
A single piece of software, if you want to call it that, a whole solution might be calling all kinds of containers, might be using other people’s containers. It just goes on and on. In fact, this is the worst kind of spaghetti logic potential that perhaps we’ve ever had.
Maybe other people could argue otherwise but I think that though, Tim, the reason why we don’t have to worry too much about it being spaghetti code is the amount of benefit that we’re getting from isolating discrete bits of code and hosting it on the cloud. There is this new way of thinking, especially with the DevOps cultures now that we’re starting to see being developed. It’s all a very, very big net positive.
There is one big potential net negative. The orchestration engines that are actually helping to curtail the spaghetti potential and do things really well like handle networking definitions and all the things that are necessary to make sure that lots and lots of containers work together well, one of the things that they’re actually running are Certificate Authorities. Because of the fact that there is such a big need for TLS certificates for things such as mutual TLS authenticated sessions to other APIs or discrete pieces of logic. You might even need, if your application happens to be a web application where your SSL certificate is actually being provisioned within that logic when this immutable disposable infrastructure is brought up and brought down. That’s a lot of certs all of a sudden.
Think back in the old days of “Geez, I just need one SSL cert and it’s going to sit there for a year, two years. This particular application goes off and makes an API call to something. I’ve got a TLS certificate that I provisioned a lot time ago. I’ve got it written down in a spreadsheet so when that thing expires, I’ll just go handle it.” Multiply that by, I don’t know, pick a large number in your mind. All of a sudden it is unmanageable and some of the Certificate Authorities we’re talking about, I think the majority of them are self-signed CAs that are just sitting there in not terribly well-protected premises if you just want to call it that.
As PKI guy, I just shake my head. I love this technology. It is the future, but when it comes down to just purely the TLS certificate management part of it, I don’t think this has been completely fully thought out yet.
-
Tim Callan
And Jay, correct me if I’m wrong but the certificate element of this architecture is indispensible because it is how identity is provided It is how you make sure that before your software takes action, that it’s getting its command from the legitimate source of that command or before your software reports information it’s reporting it to the legitimate recipient, right? Like the only way this is done is certs, correct?
-
Jason Soroko
Correct.
-
Tim Callan
So if you are using a popular containerization environment, you’re using Docker, you’re using Kubernetes, you are running a CA whether or not you know it.
-
Jason Soroko
With Kubernetes, absolutely, you are.
-
Tim Callan
Okay, with Kubernetes. So, what are the potential consequences of people who all of the sudden have become their own certificate authority maybe without intending to?
-
Jason Soroko
You know if I’m a CIO or if I’m CSO and I’m hearing this for the first time, I might be asking my DevOps team these questions. Because Tim, you know one of our favorite themes across all these podcasts we do is what happens when certificate management goes bad?
Things go out and one question we haven’t even asked too much yet, because we think the answer is perhaps too obvious, but what happens if a CA gets compromised? The answer to all these questions is very, very bad things.
-
Tim Callan
Potentially once your identity system is not reliable, then any entity anywhere in your workstream might be a bad actor, might be spying, might be stealing data, might be giving false commands. All of those things are possible. They could be stealing your money or just disrupting your operations or stealing credentials or other things like that.
-
Jason Soroko
So Tim, let’s think about this now: Let’s widen out the problem just slightly and for those of you who are being asked by your management, “Hey we’re really glad you created this proof of concept inside of one cloud, such as maybe Amazon AWS,” you could tell them, “I do my TLS management right within that cloud.”
Well that’s great. It’s still a self-signing certificate. It still has its issues, but at least you have some kind of management system that’s helping you out, that perhaps is part of the infrastructure that you happen to be using with that one cloud. What happens if your CIO tomorrow says, “AWS is too expensive today. I want to rip it out and put it onto some other cloud tomorrow, and then next week I want to bring it in-house, and oh, by the way the week after that…”
-
Tim Callan
And not all of it. I’m just going to move some of it, and I’m going to leave some of it where it is.
-
Jason Soroko
In fact, your CIO is going to say things like “You told me this thing was immutable, and so why can’t I do that?” And then you’re going to give the old answer “Homina homina.” And go, “That’s all great sir or madam, but the problem is that our entire security infrastructure is baked into one particular cloud.” And don’t think Amazon or the others haven’t thought of that, right?
-
Tim Callan
So, what do you do?
-
Jason Soroko
There’s such a thing as a trusted third-party CA, and one of the reasons why you come to a trusted third-party CA is so that your trust model is basically maneuverable and interoperability is going to be the norm. Instead of just setting up some self-signed Certificate Authority through a series of a few Linux commands and then crossing your fingers and saying, “I didn’t see any error messages, so it must be good.”
You might want to think about setting up a CA with people who actually understand how to run a proper CA and know how to protect it and know how to make it reliable and all those things that you don’t have if you just do it by yourself.
-
Tim Callan
This is why we have specialization, right? This is the reason that we’re all not growing grain in our backyards. It makes more sense to have somebody else do that, and we can all focus in on things that we’re good at. I think this is a perfect example of that.
-
Jason Soroko
It gives you that ability to say, “I have a well-protected CA. I can rely on it. This is run by the people who know how to do it properly,” but I think you can even go further than that. There’s some value add by choosing a commercial third-party CA for your DevOps and containerization purposes, and that’s the fact that we can also wrap in capabilities such as code signing your containers. If something gets modified between the developer release and the actual code being executed, you can rest assured that you actually have the integrity of your own code, and that kind of thing can be actually put into your logic to today’s third party vault tools which might act like CAs but don’t have that kind of capability.
The ability to have a single root for all your applications, the ability for the root and subordinate private keys to be protected in HSM, the ability to have multiple Kubernetes clusters that are actually rooted in a single place or whatever other trust model that you happen to need, those are the kinds of things that you need to go to a trusted third-party CA.
You’re not going to be able to do that yourself, and typically the people writing tools, like the vault tools that you’ve all read about, that’s not what they’re experts in doing. What most vault tools are really trying to do is things such as, “You’ve got static credentials for your MySQL database, and behind the scenes with your Mongo database, whatever it happens to be, I need to automatically log into those systems from a headless discrete logic system.” I am not going to log into that myself, so I need to pull it from somewhere, and I need to pull it from somewhere securely. Those vault systems do that very, very well.
But as soon as you get into the world of PKI, the complexity of the trust model, the complexity of rotating those certificates, and – here’s another concept – your OCSP responders, once you start having large complex enterprise level applications that are within the containerized systems, are you going to do revocation checks on those certificates that you’ve actually issued? Well you know, a modern trusted third-party CA will also be able to set up OCSP responders for you, let you have that kind of capability. Those are powerful PKI tools that you’re just not going to have if you’re setting up your self-signed Certificate Authority.
-
Tim Callan
Right, and if these are your most essential mission critical systems, if these are the places that you move your secrets around, where you control your money, where you service your customers, yeah, IT departments are asking themselves, “How can I be secure at that level?”
-
Jason Soroko
And if you’re running any other kind of key material, SSH keys, if you’re provisioning SSL certificates for web applications that you happen to be running, a trusted third-party CA can also handle those things for you all at the same time, as well as your code signing, as well as setting up your CA properly, protecting it properly, all those kinds of things. So for those of you who are working furiously right now with these fantastic new technologies in DevOps, you might want to think about not just doing the proof of concept example of firing up a little CA and hoping for the best and crossing your fingers. People who have been in the PKI business a long time can really start helping you out here.
-
Tim Callan
I think maybe that’s a good point to end on, Jay. As always, this is a complex topic and there’s more and I’m sure we’ll get into more depth in episodes to come.