Microservices Security Summercamp – Session 1


Security Microservices Summercamp Session 1: Introduction to Security for Microservices

On June 26th, we explored the new architecture, how it is different from applications of the past, and what considerations security teams need to make to ensure strong security and compliance monitoring for these modern applications.

Andy: Hello, welcome everyone to our Microservices Security Summer Camp. This is session one. Today is gonna be an introduction to microservice security.

So the Microservices Summer Camp Series is gonna be a three part series. We’ll begin with today with our introduction. And this is gonna walk us through what’s different about modern applications and what are the implications for security. And that’s just gonna start getting the juices flowing so you have a good idea of what it is that we’re gonna get into. And in a couple of days, we’re gonna follow up with some best practices. So how do you secure a microservices environment. Some of the basics that you need to get done. So we’re gonna give a couple of weeks. So that that sets in, you might evaluate your own environment, what some of your needs might be. And on July 12th we’ll talk about how to evaluate different solutions.

Our host for today, our camp counselor is Christopher. He’s the original architect behind Tigera’s Project Calico, which many of you know about. He speaks quite a bit for Tigera. He’s at 60 plus meetups per year. He educates on both networking with Calico as well as network security for modern applications. He’s also our primary consultant. He runs a consulting team here and deals with our … Works with our enterprise clients to help them meet their security compliance requirements for these applications. And he just so happened to actually be a park ranger.

Christopher L: Alright. And it’s Christopher Liljenstolpe as Andy chickened out on my last name. So welcome. Thank you for joining our camp today. I thought what I’d start off with is really just sort of explaining a little bit about what’s changed. And every time we go through an industry shift, everyone talks about how the world has changed. And then you look back and say it really didn’t change all that much and so why is this time different. Why are microservices really bringing change to the environment in which you deploy your applications and how you deploy your applications. So I thought I’d spend a little time talking about what these changes are. Because these changes are gonna impact many different parts of an organization. Not just security, but all up and down the IT and application delivery space. And maybe even beyond. So let’s talk about some of them.

First of all, why are we doing this … Why is everyone running toward microservices? What’s so great about microservices? Why is application development changing? First, talk a little bit about what is changing. We’re moving to microservices. So its monolithic applications are being decomposed into constituent components. The thing you want to think about is functions or services that are mixed together to create an application or remixed in different patterns to create other applications.

In order to deliver these microservices, the industry sort of settled around the concept called containers. And it’s a fallacy to think about a container as a lightweight VM. I know a lot of people think about this as just the next generation of virtual machines. And it’s really not. If I look inside a virtual machine, a virtual machine is really just a server and it has all the same components of a server, it has an operating system, it has user [inaudible 00:03:57] applications, you can SSH into it. You can … All the things are there in the VM. A container is inherently different. A container is really just a package around an application and only bring … Includes the things that are necessary to run a given piece of code within an environment. So it doesn’t really have an operating system. It relies on the underlying platform to provide that operating system. It really doesn’t have a bunch of user land applications.

In fact, I’ll touch on this a little bit later. I mean, most cases you can’t even SSH into a container because there’s no SSH [inaudible 00:04:29]. Most of the things that you would expect to find in a VM aren’t in containers and that has some advantages and it also causes some changes the way you think about them. So you really want to think about them. They’re more of a mechanism to package up a microservice or a chunk of code with the necessary items to allow it to run. So what it’s actually dependent on, what libraries, et cetera it needs to have to function. But it is not a virtual machine in the classical sense of the term.

Next we’ve got an orchestrator. So if we start thinking about these microservices, there are a lot of them and they are constantly changing. And we’ll talk about more… We’ll dive into that a little bit more in the next couple minutes as well. But in order to control and manage this large fleet of containers, we need some kind of mechanism to orchestrate that because at this point we’re at a scale where a human is not easily gonna be able to manage this infrastructure manually. So what we need is some kind of automation that you tell what it … You tell the automation, in this case, orchestrator what your intent is. My intent is that this application is built out of these microservices and it’s going to scale this way based on load and this is how it recovers from [inaudible 00:05:57] et cetera. And then you let the orchestrator enforce the intent rather than you doing it manually. If you’re manually intervening in a system in a microservices environment, you’re doing something wrong. You should be instead changing the intent you express to the orchestrator and let the orchestrator automate that. And we’ll talk a little bit more about this as we go too.

And lastly, elastic cloud infrastructure. Basically in this model and in this concept, we all have now private data centers maybe. Maybe we don’t. Maybe we’re moving to the cloud. Most people are moving to some kind of public hosting infrastructure as well as maybe a private cloud or in lieu of a private cloud. So we need to be able to run this environment wherever currently makes sense. So we’re working with one large enterprise and they’ve been for very good reasons very much on prim. Substantially on prim. But now they’ve decided they wanna actually use public cloud infrastructure to extend their application footprint into new regions because they want to see if there’s demand in those regions or they want us to create demand in those regions before they make the capital investment to build out a data center. So they’re actually using public cloud infrastructure as footprint extension. You see other people use it for shock absorption. Christmas ordering season or mothers day in the US et cetera. Or you need a rapid increase in capability and building another data center takes time whereas I can just go lease servers or lease resources from one of the major cloud providers like Amazon, Azure, Google, IBM, et cetera.

Also, no matter what you think today in your organization, you’re gonna be doing. We’re gonna be doing public cloud, we’re gonna be doing private cloud. We’re gonna be doing public cloud in say AWS. Situations change. You might MNA and acquire a company that is mainly based in Azure or Google. You might move into an area, or a part of the industry, or a market segment where data sovereignty becomes important and you need to put data in a given country. And the only cloud provider who has resources in that country might be Azure. So now even though you said you were gonna use AWS, you now need to use Azure because of these other requirements, or Google, or whoever. And so you need to be able to make sure that even though you’ve built this wonderful plan for today that you don’t need to rip everything up and start again in the future, which might be tomorrow.

So you need some place … You need a model where you can take this orchestration environment, and these containers, and the microservices they contain, and the intent you’ve expressed and deploy that wherever makes the most sense at that point in time without being tied to those underlying infrastructures. You don’t want to do a whole bunch of work around automating to one of the providers and then having to go redo that work automating to another provider. You want to [inaudible 00:09:22]. So when we talk about elastic cloud infrastructure, that’s really what we’re talking about.

So let’s talk, again, at a very high level about what is happening here. So work loads drive exponentially greater demands on networking and security. Couple of concepts around containers. Containers as I said aren’t full service servers. So whereas a VM might take five to 10 minutes to boot because it’s a server booting and it’s booting on slower hardware, i.e. virtualized hardware. A container only needs to start the actual application. So containers start times are usually sub second. If first time you pull them down after [inaudible 00:10:04] it might be a couple seconds but its definitely not five to 10 minutes. So your containers start much, much faster. You have many more containers because now what we’ve done is taken a few large applications and split them up into their constituent parts, microservices. So as a general rule when we talk to customers, they might have eight, 16, 20 VM’s on a server. They’re targeting 80 to 120 to 200 containers per server. In fact, I’ve heard people talking as much as 1000 containers per server. So you have many more workloads in your infrastructure than what you had before.

Containers by their nature, and we’ll talk about this a little bit more, have inherently shorter lifetime. One of the things we’re … Where we’re going to this market and we’ll talk about it in a minute in this model is agility. I want my code out there and I want to rev it frequently. We very often talk about MVP in the industry, minimum viable product. You get an idea, get it coded quickly, get it out there. See if your customers, or your users or consumers like it. If they like it, then you start tweaking it. Try blue green casting. Try this way out versus another way out. See which people like more. Iterate. Iterate. Iterate. And because the individual microservices are small and simple, they’re easier to iterate faster. I don’t need a waterfall model. I can use agile develop methodologies. There are people out there who are pushing hundreds, if not thousands of code changes a day into production. That may not be what you’re gonna do, but as a general rule, it’s safe to assume that containers have much shorter lifetimes. Lifetimes that can be measured in wall clock time versus VM’s which you measured in calendar time. So containers might last minutes or hours, maybe a couple of days. Whereas VM’s would last weeks, months, even years. So there’s a much shorter lifetime for these things as well.

Together this means you’re gonna see … At least our belief is and when we talked to customers. Conservatively at 250 X churn in the network. The network will have changes in its endpoints 250 times more frequently than what you had in a VM world and certainly much more than that in a physical world. And it’s churn that causes problems for at scale. So if you have orchestration systems today that may already be fragile and brittle, say around the network. In your VM world, when you start churning that 250 times more than you were now, that brittleness will, I guess to be polite, express itself more violently than it does currently. It also means that there’s a lot more attack service. You’ve got more workloads. Those workloads come from more places. Therefore you’ve got a much larger attack surface in your environment. So those are some of the challenges that come along with this migration to microservices.

So what does this really mean? External pressures. Why are we doing this? Your competition, your customer or user expectations are driving a more responsive and agile software and service deployment environment. I don’t care if you’re a software company and you’re doing say online HR and people expect to be able to make a status change on an employee and immediately have it replicate to payroll and authentications systems, et cetera. You may be a more traditional enterprise. Let’s say you’re a car manufacturer. But even though you’re a car manufacturer, people are gonna want to start interacting with you to look up service issues, request service appointments, maybe even change some of the software in their car eventually. They’re gonna want that to behave like they had come to expect everything else to behave on the web. So even though you’re not a software … Don’t think you’re a software company, you are. And once you become a software company and you have a presence on the public networks, that means you are gonna be held to that standard of expectation. If we look at legacy enterprises, end of day batch processing before you update someone’s account record isn’t gonna pass muster if you’re a bank anymore. People want to see their balance right now, not what the balance was yesterday.

So there’s gonna be … There’s many things both in the modern world and in the more legacy world that’s driving people to web time improvements, web time interactions with their customers, real time interactions. So that’s some of your external pressures. And if you don’t do that, your competition’s going to do that. And that’s not a good outcome. So even if you’re not a software business, you are a software business and we’re seeing a lot of our enterprises realizing that they are a software business as well as a manufacturing business or a services business.

Developers have grown up now in a world of CI/CD and FOSS resources. Your developers came out of school, they used open source projects as part of their project code … As code for their projects they did in school. They’re used to checking code and to GitHub and have it automatically deploy, and continuous integration, continuous delivery … And this permeates everything. Even the way people do documentation today in development world is software based. They use the same primitives et cetera that you make a change to the documentation, you check it in, it gets processed and rendered in whatever format, and spit out the back end. So this is the world developers want to work in. Developers don’t want to work in legacy, waterfall models where they don’t have access to all the code that’s out there that already solves the problems. The hard problems they’re trying to solve. So they expect to be able to reuse code. Theirs and other people’s.

Dependency hell is not expected. This is one of the things that really brought containers to light originally was that containers bring with you the ability to say this bit of code relies on say Python 2.7 or this library. Another piece of code relies on Python 3.0. Those two don’t get along too well together. If I try and put that piece of code on a server and that server has Python 2.7, that’s great except when the people try and put the Python 3.0 dependent code on that server it will break. So then they installed Python 3.0 on that server and they break the code that depended on Python 2.7. That’s dependency hell. No one likes it. So one of the things you do is say I’m gonna bring Python 2.7 or Python 3.0. I’m gonna bundle that into this container. So this allows the developer to avoid this dependency at how well it worked on my machine, but when I deployed it, it didn’t work. Or I deployed it … Worst case … I deployed it on these four servers and it worked and on these two servers and it didn’t.So this is what your developers want. And the developers are pushing for this model because this is what they’re most productive in, this is the environment they want to work in.

So how have we responded? We’re implementing. Even in legacy enterprises. CI/CD chains. Continuous integration, continuous delivery chains. Instead of having a very manual process, developers check and code. That code gets tested, gets checked to make sure that it functions and then is basically available to merge or to deploy into production all with mostly automated processes. So that is … Pretty much everyone today is using something like GitHub or investigating that at least anyway.

We’re … Have brought in containers and orchestrators. So we’ve brought in things like Kubernetes to orchestrate the containers of the developers are developing. Developers are running docker containers and then we use the orchestrator to orchestrate those out. This is an area of incredible development in the space and it’s moving very fast. But this is the kind of environment that’s being created now.

Fungible undifferentiated resources. The underlying infrastructure shouldn’t have any special meaning. Don’t matter if those servers are VM’s or servers. Doesn’t matter if they’re in public cloud or private cloud. All I really care is they behave like a basic server [inaudible 00:18:40] server usually. And they have basic networking, they have basic storage. So my applications aren’t tied to specific capabilities or specific hardware platforms. Makes them portable.

Immutable deployments. The thing that is most frustrating when I go into an environment if I’m a developer is sometimes it works and sometimes it doesn’t. So the whole idea of if I checked in the code and it works and I deployed it and it works, then every time I deploy that same code it should also work. So that’s called immutable deployments. That things don’t change once they’ve been checked in and validated. It makes it very easy to regress to a known working environment at any time versus trying to figure out how to back out of a botched chain set.

Lots of smaller well-defined pieces of code are preferable to large monoliths. It means that developers, it’s easier to develop for small well understandable blocks of code than a big one million line [inaudible 00:19:40] so we prefer to have these. Also allows us to mix and match. I write a service once. Something that’s gonna write to a Redis data base, I write it once, and then I just link to it and all the applications that need to talk to Redis versus rewriting that Redis code every single time and every single application. This allows for internal reuses, I said, in mixing and matching. I can recompose parts to make different applications or different behaviors. And I write it once and I’m done.

So what’s different in this new world? Where does the code come from? How do you deploy it? How do you modify it? How do you address bugs, malfunction, et cetera? How do you patch it? What’s the anchor of a service? What’s important? And how do you manage the underlying infrastructure. We’re gonna look at each of these briefly and say what’s changed in these environments. Use a sort of an operational view of and development view of application delivery. And how fast does it change? Sorry.

So where does the code come from? In a previous environment, you either wrote it yourself or it came from your vendors. You had very limited numbers of sources for your code. You pretty much knew where they came from and knew who to choke when they didn’t work. In the new environment, it can come home grown. You could’ve written them from scratch or you could have just written glue code that glues other components together. It can come from vendors still, but also comes from FOSS resources. I can pull code from GitHub, I can fork code from GitHub, I can pull working containers from Dockerhub or Quay, I can use and pulling things from Go, Lang Libraries, or Python, or Pearl Libraries, et cetera, or other internal sources. I didn’t write it but someone else in the company did or in a subsidiary et cetera. Or a partner.

So I now have many more places where code can come in. And this code can be mixed and matched. A piece of code from a vendor, it can be mixed with a … Something I pulled off of GitHub and an open source project like Engine X. And I now just built a new application and that application has code that sourced from four different locations. Four different authors. Four different languages.

How do you deploy it? Previous world you’d go ask for necessary infrastructure. I want a server of a certain type, a computer of certain type. I want storage and network of a certain type to be provisioned. I raised [inaudible 00:22:11] to get this done. This might take weeks, this might take months to get done. Ut they are the right VM’s to be provisioned or built, et cetera. One that would all be done, I build a master image of my code in a very legacy, waterfall approach and then I deploy it onto that infrastructure and hand-tune it to make sure it worked well. If I got four different types of servers over time, I’d be hand-tuning four different images because I’ve got slightly different behavior. So every one of these things is now a pet. It’s unique.

If I go into the new world, I define a manifest that says this is the code. This is how it should be packaged up and deployed. That itself is a piece of code. It’s a piece of [inaudible 00:22:56] that defines what I want the system to do. It’s my intent. Then I submit that manifest to a CI/CD chain. I check it into Git or a Perforce or something along those lines. And it’s examined. Maybe there’s a manual code review. Maybe not. Maybe I run through end to end test, or function unit, functional tests, et cetera. But off the back end of that, that would normally just get shipped off to an orchestrator like Kubernetes who will then take that manifest, take your intent, and make it so in the network much like John Luke Picard. Make it so. And so this is a very different behavior. You’ll notice there is no hand tuning, there is no master image, there is no manual deployment onto each server. There is no asking for specific storage networking compute. This is all managed [inaudible 00:23:42] by this manifest and the underlying CI/CD chain.

How do I modify it? Okay I … Some things I want to do differently. In the previous world, I’d build a new version and test it. I’d then log into the compute that’s hosting that work load and deploy the new version. And then go back and hand-tune or fix and try again. So once again, this is all pets all the way down as each of these new versions I have to tweak to get on the right platform and I’m gonna be manually logging in and changing this. There is no immutability here. There is no way to go back and recreate this without going through all these same steps and hopefully you documented what you did. We all know coders are wonderful documentarians and they will always completely document what they do.

In the new world, we develop or push a new version into the CI/CD. I may change the manifest, I may change the code itself. And then it goes into the orchestrator and it’s committed in CI/CD. If I want to go back to this version I just recommit it. If I want to change it, I go on. But it’s immutable. Again, I’m not doing hand-tuning. I’m just saying here is a new version of this. Please go push it.

What about the anchors? What’s important? In the previous world, we were worried about IP addresses, DNS names, Networks, servers, disks. All those things were important to the service and identified the service and the application I was deploying. These are just resources provided by underlying infrastructure. I’ve now anchored something I want fungible and changeable to something that I don’t want fungible and changeable, i.e. the application itself. This is sort of a bad mapping.

In the new world it’s metadata and labels. I’ve labeled this as a certain component of an application. I’ve labeled that it’s deployed in Europe. I’ve labeled that it needs to be exposed as a service. API endpoints, services, service accounts and TLS certs. These are the things that identify these microservices and the applications that constituate in the new world. IP addresses are irrelevant, networks, and servers, and disks are irrelevant. It’s own … What is important are these labels and other metadata that’s part of that manifest that we talked about earlier.

Underlying infrastructure. They’re all pets. Before every server was a pet, they’re all hand managed no matter what anyone tells you, and they’re all just a little bit different. There is no way to build a repeatable model in this world. Go look in your own data centers and tell me you’ve got all the same server everywhere and they’re all configured exactly the same way. You win a prize if that’s really the case.

In the new world, they’re just cattle. Servers can fail and I don’t care. The orchestrator will move the workloads to something else. Racks can fail. Maybe even entire data centers can fail. The orchestration … Because all of them are treated as just a femoral, fungible objects that have no special characteristics, it doesn’t matter if they fail. I just move the workloads elsewhere. This is all driven by that intent, that manifest, those code artifacts that I’ve written rather than typing everything to physical … Specific physical capabilities, specific physical infrastructure. There’s no way to get that cloud portability and cloud scaling that we were talking about using the previous world model.

Previous world, underlying infrastructure. If you needed change infrastructure, this was done on calendar weeks, months, years to get servers, data centers built, new networks provision, new firewall rules. All of this happened in calendar time. It doesn’t really work if we’re trying to deploy multiple times a day. In the new world, because the underlying infrastructure is fungible and what I’m doing is pushing code that changes how I interact with it, I can now make changes in days, hours, minutes, seconds, even sub-second. So I now can have the requests I’m making underlying infrastructure match the time frame I’m trying to deploy applications.

So that’s a bit of a brain dump on why this world is different than our previous model or all the various previous models we’ve had to date. I’d love to open it up for questions if we have any time. And we hope to see you on June 28th when I’ll talk about how this impact security is. You might guess there are some changes you’re gonna have to think about. The way you do security now is maybe not going to quite jive with this new model.