Zero Trust Security: Supporting a CARTA approach with Network Security

Learn how to support, what Gartner has termed, a continuous adaptive risk and trust assessment (CARTA) when building a CaaS platform using Kubernetes. Network security enables microsegmentation and is a core component of a zero trust security model. It allows you to protect your workloads against threats without relying on assumptions about the network, infrastructure, and workloads.

Complete Transcript

Michael Kopp: Hello everyone and welcome to today’s webinar, Zero Trust Security: Supporting a CARTA Approach with Network Security. I am pleased to introduce today’s speaker and that is Christopher Liljenstolpe. He is the CTO of solutions here at Tigera and is also the original architect behind Tigera’s project CALICO. He speaks at more than 60 meetups every year, educating people on networking and network security for modern applications.

And he also does consulting with Tigera’s enterprise clients on security and compliance for modern applications, Kubernetes, Istio, container as a service platform, et cetera.

So without further ado, I’d like to hand things over to Christopher

Christopher: Thanks Michael. So where we’re going to talk about today is, start a series on Gartner’s CARTA model and how that relates to zero trust and what we’re doing here at Tigera.

So whether or not you believe in Gartner’s, well first of all what is the CARTA model? So the CARTA model is sort of a life cycle of zero trust model that Gartner has put together, envisions the way people will be managing zero trust infrastructure beyond day zero, beyond setup. Whether you adhere or agree with the exact Gartner CARTA model, there’s a number of interesting components to it that I think do make sense to talk about in general scope.

So first let’s talk about the zero trust. It’s a buzzword. Everyone here is pulling their buzzword bingo bingo cards out right now. So zero trust is a bit of a buzzword today. But really it encompasses some very solid security concepts that we’ve had in the industry for a very long time and packages them up into one concept, zero trust.

One is the concept of least privilege. Things should only be able to make connections that they need to make. You shouldn’t leave connectivity available for things that don’t need to have connectivity. So if we think about my classic Bobs, Alices and Charlies, if Bob and Alice need to talk, that’s fine. If Alice and Charlie need to talk, that’s fine. There’s no reason to have Bob and Charlie being able to communicate. So concept of least privilege is that you don’t allow that. You only allow the connections or communication paths that are required to accomplish the mission or to enable the application to work, et cetera.

The second part about zero trust is that we need to be able to, with some degree of surety, ensure that those really are Bobs, Alices and Charlies and that we know what Bob, Alice and Charlie are. So this comes similar to what we start thinking about multifactor authentication, et cetera. Instead of using a single piece of data to identify something, trying to use as many as possible. Maybe it’s behaviors, maybe it’s identity in different realms. But you start trying to build a picture so that you know or have a reasonable guarantee that Alice is Alice and Bob is Bob.

So if you start taking the concepts behind MFA, multifactor authentication, least privileged, we start approaching this concept of zero trust. The idea behind zero trust is basically there is no ultimately trusted entity in the infrastructure or in your platform. People, applications, infrastructure, et cetera, none of that is trustworthy. At a Tigera for a long time before all this came up, we sort of talked about not only protecting your workloads from the rest of the world, but protecting the rest of the world from your workloads.

Another little thing, and people who’ve been to my meetups that have heard me say this, if you talk to the relevant organs in both private and the commercial or government sphere, there’s exactly two types of organizations in this world. Those that know they have an APT or advanced persistent threat in their infrastructure and those that don’t.

So we have to assume, and especially in today’s much more dynamic world, that the bad guys are already in the perimeter. That’s where trust comes in. Zero trust model, we’re going to make the assumption the bad guy’s already in and we’re still going to try and protect the infrastructure as best as possible. So that’s zero trust and it’s a very long prelude to this slide.

So once you start talking about the trust, one of the things Gartner has said is that zero trust is a point in time thing. You configure your environment based on the way you think your environment should be working, based on your assumptions. And in a containerized world those are usually better than they were in the legacy environment because each of these containers does fewer things. It’s easier to scope. But that’s a point in time thing.

What Gartner is saying, it’s sort of obvious, is that you then need to evaluate how those controls are working. Are you properly identifying workloads? Is any endpoint or entity in the network able to spoof and say it’s a Bob and it’s really not and how do you detect that? Are the controls that you put in place too restrictive or not restrictive enough?

So the whole idea behind CARTA, or one of them, is it requires a continual reassessment of zero trust mediated flow. So basically you do your zero trust. You then watch the infrastructure, you see the behavior and you make adjustments as necessary and iterate. Wash, rinse, repeat. So that’s sort of the CARTA approach. We’ve got the nice Gartner graphic here. So kudos to Gartner for drawing some nice pictures and wrapping a name around this concept.

So one of the ways that you enable a zero trust environment is again, this concept of least privilege. What should be allowed to talk to what? So it’s a white listing model versus a black listing model. Most people have white lists, think that white listing is inherently difficult. And in a legacy environment where you don’t know, you inherited applications they’re big hairy applications, they open connections to everything and their brother, that’s hard to do.

In a more modern world where you’re defining these individual microservices, each microservice is only going to have a couple of connections and it’s easily scoped. An example of the legacy, a previous company I worked for at one time had a very elderly platform and no one really understood why that platform existed. So it was turned off. That’s sort of the ultimate form of micro segmentation. You don’t know what this is, so let’s turn it off. You check, nothing happened. Everything’s still fine. They discovered a couple months later when they went to quarter close that that platform was actually used as part of quarter close so they had to resurrect it.

So when you’re doing micro segmentation or white listing in legacy environments, there’s a bit of a challenge. And we’ll talk about that, we’ve talked about that some before. We’ll talk about that in other webinars coming up.

But basically micro segmentation is a form of white listing. It’s usually white listing based on personalities or roles of things rather than individual things. So all the things that our customer front ends should be able to talk to all the things that our customer databases. Not this particular pod and this pod and this pod and this pod should be able to talk to this pod, this pod and this pod.

So when do that, when we decide that we’re going to use micro segmentation as our tool to do least privilege, which is again, part of zero trust, it allows you to enable a default deny network posture. You’re not, as I said, we are not relying on things within the network necessarily and you’re establishing trust of the workloads before any network access is granted. Is this really a Bob, and if so, what is Bob allowed to talk to? Rather than allowing that pod on the net and then later trying to figure out what it’s trying to talk to and make a decision if it should be allowed or not.

So let’s talk about what micro segmentation is. We’ve done this before, but we’re doing this within the context of CARTA. So we have an application graph and this is the Bobs, Alices and Charlies. And we haven’t done anything so the Bobs can talk to the Alices and the Charlies and the Charlies can talk to the Bobs and the Alices and everyone can communicate with one another. That’s great. Absolutely no security here. The end result will be predictable. So most organizations in the containerized world have, frankly for a long time, have said the perimeter …

Christopher: … have, frankly, for a long time, have said the perimeter fire wall is the solution to all ills. There is a, for those of you who like Gary Larson, there’s a Farside cartoon. If you search for Gary Farside, Hard, crunchy outside, soft, chew inside, you’ll see the way I view about perimeter firewalls. You have a perimeter fire wall, the bad guys are on the outside, the good guys are in the inside. Sort of like building a wall and a mote around your castle. The problem is, once the wall and the mote is breached, there’s nothing protecting you at that point. Once somebody gets in, they have full motion. Sometimes called lateral movement, et cetera.

And this is the way most of the breaches that have made it on the front page of the New York Times, Financial Times, et cetera, have started. Somebody gets into that cordon sanitaire, social engineering, bad piece of code imported, whatever, and then they have free range to lateral move around the entire estate until they find what they’re looking for, in some cases over a period of months without being detected. So, this is also why very early medieval castles only had a wall and a mote, and pretty soon they started adding bay [inaudible] and inner walls and outer walls, and towers, and dungeons because having a single perimeter is not going to stop the attack once it gets in. So, as a general rule, putting a cordon sanitaire, or perimeter firewall around things and calling it done is probably not the right answer. Not saying that perimeter fire walls aren’t useful. They’re just not sufficient.

So, micro-segmentation says instead of putting the perimeter around all of the things as one big blob, why don’t we put the perimeter around each end point? And I hate … People say this is a new idea. Actually, back in the early days of, or the middle aged days, of internet networking on EV 80s and 90s, which maybe I have enough gray in my beard to say I was part of … That’s actually the way we sort of view things, is if you needed to defend your host on the net, we didn’t have firewalls back then. We can have a long conversation about where firewalls came from, but you actually had host protection. You secured every host on the network.

So, we’re basically coming back … In this industry everything that’s old is new again. So, we’re going back now to securing end points. The idea is you put a perimeter around the end point itself, in a containerized environment that’s around each and every container. So now I put a wall around the Bobs, and the Alices, and Freds, and not only a wall around Bob that says only Alice can talk to me, but also a wall around Alice that says I can only talk to Bob. So you actually put these perimeters, so they’re both inbound and outbound. So you now have multiple points of protection. So your filters are not just inbound filters. They’re also outbound policies, as well.

Now, if you, again, have the intruder, the aggressor, the adversary get in, they have compromised exactly one work load, or one very small portion of your overall infrastructure, and they’re contained. Lateral movement becomes very difficult. Because the first thing it has to do is figure out what lateral movement is even possible from within this pod. What connections are allowed? And if this is rendered correctly, like it is in [Tig-Era] to both our commercial and open-source solutions, this is rendered correctly, then from within the pod you can’t introspect and find out what’s allowed and what’s not allowed, unless you’re doing a brute force door knock. Scan every possible thing and see what’s allowed and what’s not. Hopefully, and we’ll talk about this more when we get to our visibility webinar coming up soon, hopefully if something starts doing a door knock, rattling your door knobs from within an infected pod, you’ll detect it. And you’ll think, “Hm, this is strange. Nothing should be trying to do a full port scan of my infrastructure.” And you shut the thing down. Problem solved. So there’s really no way, if this is done right, to introspect from within the pod to figure out what paths are available to you. And the pure act of the aggressor trying to map out what is allowed, so they can figure out how to do lateral movement, is going to trip the alarms that you need to be able to say, “Hey, there’s something wrong here.”

So, that’s a little bit about how you do this. You’re blocking that aggressor out, and the pure attempt of it trying to get out, should set off all kinds of alarms in your infrastructure. So let’s think about a basic application. So I’ve got a micro-services application here. It’s a FUBAR application. I think I’m gonna start a book review. So I got some details about the product. I’ve got some ratings. One star, two star, three star, four star, five star, and even some reviews. What did the people who gave those ratings … Why did they give it that way? This is standard E-commerce kind of thing, but it’s basically three applications all being called by another application called Product Page.

So, let’s look at how this works. So, Product Page, the front end of the application, makes calls to two things. Reviews and details. Details about the product and reviews. The star ratings. And … Sorry, the reviews of the product, and as part of the reviews you then go into the ratings micro-service to get what the actual ratings were for a given review. So we now have three micro-services that are all … Ratings is feeding reviews and reviews in details is feeding Product Page. So that’s how your micro-service works. That’s great. What you really want to make sure now, of course, is that Product Page can only talk to reviews in details, and reviews can only talk to ratings. That gives you your least privilege. I don’t need to allow details to talk to ratings or Product Page to talk to ratings, or reviews to talk to details. None of those are valid paths. They shouldn’t be allowed in the infrastructure.

I now have another issue. So, let’s say one side is the US and the other side is Europe on these. So I have two different versions of this application, one running in Europe and one running in the US. This might be because of GDPR requirements, or language, but there is potentially multiple versions of this application. So, let’s think about policy delegation. The developer knows the product page needs to be able to talk to rating to reviews, and details, and knows that reviews needs to be able to talk to ratings. So that’s something the developer knows, and that’s something they want to write policy for.

Similarly, the security or compliance team has a policy that says only US apps should be able to talk to other US apps, and only EU apps should be able to talk to other EU apps, and there neither the two shall meet. Don’t cross the streams, for whatever reason this might be. So, how do we render this? First, let’s take a look at how we can do some policies around … I think this one maybe got mis-ordered, but that’s okay.

Christopher: I think that was my bad, but let’s go back to this in just a minute. Let’s go back to what this policy looks like in just a minute. Let’s go down now and see if we first think about what the developer does, the developer says product pages can talk to reviews and details, and reviews can talk to ratings. They don’t care where this is running, so basically they’ve done this graph. Salmon can talk to green and blue, and green can talk to yellow. That’s the developer’s input.

In parallel, the info sect team has a policy that says greens can only talk to greens and salmons can only talk to salmons. Or US can only talk to US and EU can only talk to EU. So what we really want is a set of policies, or two separate sets of policies, and they should be rendered separately. One policy says US notes can only talk to US notes and one says EU notes can only talk to EU notes, and the developer has what we call the micro-service graph.

So, if you go down here and we start thinking about how you do this in a Cooper and Eddie’s world, or in a [Tigero] world, your policies need to be label-based and they need to be declarative, and they need to be rendered dynamically, because this is a dynamic environment. So basically we don’t want to use IP addresses as the foundation for our filters, because IP addresses are a femoral in Cooper and Eddie’s, and we don’t want to have to write policies that are the static in nature that we do with firewalls today. I don’t care where it’s running. From my developer’s standpoint, product page should only be able to talk to ratings, reviews, and details. Doesn’t matter where it’s running. Doesn’t matter about any other policy. It’s just that’s a rule is what product page should be able to talk to.

So, let’s talk a little bit about that now, and then we’ll come back to the policies a little bit more. So, another example of this. I have an application. Let’s take a look at what that policy looks like. In this case, I have a ratings application that needs to talk to a role application. I’ve got another application helper that shouldn’t be able to talk to the database. This is a fairly simple, declarative policy, now. It basically says that anything labeled rating CB … Doesn’t matter if there’s one of them, or a million of them, will allow traffic in … That’s an ingress role. From anything labeled role-ratings.

Christopher: Anything labeled role ratings on port TCP 6379. With no other policy set, this is the only traffic that’s going to be allowed in this cluster. So in this case, roll db will allow traffic in from ratings but will not allow traffic in from role helper. This is a declarative policy. We’re not saying specifically what pods, anything along those lines. It’s basically selector based. Anything labeled ratings should be able to talk to anything labeled role db. That’s a declarative policy. This is how it looks. First of all, this is a network policy. The first thing is we name it. We put it in a name space for this application. Then we say anything labeled role db this policy is going to apply to. Then for anything that this policy applies to is going to allow ingress in from things labeled role ratings. This is a basic declarative network policy. There are other things that we can do. This is where that first policy should come from. Let’s go back, Michael, to slide 15.

Christopher: On slide 15 we just talked about basic labeling, etc, and doing L3, L4, but now let’s talk about doing this more up the stack, layer 5 through 7. Remember we talked about zero trust. The idea of using multiple enforcement points and multiple forms of identification. I not only have things labeled app details in this case, but I’m also saying that not only is it I’m going to apply this to app details. I’m going to allow traffic in from things with a specific kind of label in Kubernetes called a service account, with the name product page. Now the interesting thing about the service account is if you’re running … Sorry, momentary loss of pointer. If you’re running Istio then service accounts map to TLS certificates and can be used from mutual TLS authentications. If you have this policy, not only are we going to match this at layer three, layer four, we’re also going to … Part of the match is going to be not only are you a Bob, but also you have a service account attached to you named product page and that drives an MTLS authentication.

Now not only is this a network filter, this is going to mandate that the appropriate MTLS exchange happens and the client can authenticate the server and vice versa. We’ve now used TLS as part of our authentication mechanism. If we go back down now to slide 25, we can even group service accounts. Just like everything else in Kubernetes, you can attach a label to something, to multiple things and have them behave as a group. Here what we’re going to do is not say a specific service account but any service account so long as the service account itself has a label of ratings is equal to reader. Service accounts are back controlled, which means that you can very tightly control who can manipulate these service accounts. Now it doesn’t matter what service account so long as that service account itself, this sort of metadata of metadata, has a label of ratings reader. Similarly, I can start putting these things together. I can start saying now if I have an app reviews I’m going to allow traffic in, provided the application label is product page and the service account has a label of reviews reader.

Now I’m doing authentication based on both the L3, L4 object as well as the TLS object. I’m now making it harder and harder to subvert this because there’s now multiple forms of identification I’m using, both cryptographic and based on what the orchestrator knows about a given endpoint. I can go even farther, put all this together and say even if you have the same … You’re an app product page and you’ve got a TLS certificate that is tied to something labeled ratings reader. If you try and do a HTTP post or you try and do a get to anything other than a URI prefaced by ratings, it will still be blocked. This is starting where we’re going to start talking about this part of ongoing introspection of your flows. I’ve allowed the flow because it matched my service account match and my pod selector patch. The flow is allowed, but the minute that flow attempts to do something that we didn’t expect for it to do, in this case it was an HTTP get on something prefaced with ratings, we’ll block the traffic and alert. This is the beginning of an automated introspection of all the flows, and again, that continual evaluation. You can then go back later and say, “Well why did this pod that had all the right credentials try and do something wrong? Was it a configuration error or was it an error in code or was it something more malicious or did my boss just hire someone to do a red team on me and they’re testing my security and I just passed? Woo hoo.” Anyway, so these policies you’ve been seeing, let’s talk now a little bit more about these. For those of you who’ve managed firewall policies in the past, they’re based complex beasts, multifaceted. This thing has all of these potential connections. This other thing has all these other potential connections. That becomes hard to introspect them, hard to break them down. How many places do I need to go change to say I’m going to allow 4-4-3 now versus 4-80? All my firewall rules might have to change because these are unique to every endpoint. Instead, you can start thinking about micro-policies.

Instead of saying this X, Bob should be able to do all these things, maybe instead the things that are Bob’s have different characteristics. Maybe they’re a foo server or a foo client. Maybe Alice is also a foo server. You can say foo servers should allow traffic from foo clients. It doesn’t matter if it’s a Bob or an Alice so long as it’s a foo service. It will allow traffic from foo client. Now I have exactly one policy to change when foo security policy changes, when I have changed foo’s port number or I require TLS. I change it in one place and it is reflected across the entire fleet no matter if they’re Bob’s, Alice’s or George’s. Similarly, I can have, and this maps this out, so I have a foo service that allows traffic from foo clients. I have a bar service that allows traffic from bar clients. In this case, anything that’s labeled a foo client can only talk to a foo service. But this orange workload here is labeled both foo client and bar client and you can talk to both. There’s only two policies.

This is not if you thought about this in classical firewall rules, there would be three policies here. Not two policies, bar allows traffic from bar client and foo allows traffic from foo clients and that allows for now there’s starting to be more interesting service graph than a simple [inaudible] direction. So that gives you a little bit about micropolicies. This allows composition of micropolicies. So we start thinking about making policies simpler. Matching the concept of micropolicies to micro segmentation, introspecting our traffic as much as possible, even at layer 7, applying concepts of least privilege, we start getting to this zero trust model and more importantly, to this … Again, you may or may not subscribe to Gardner, but this concept of that they discuss in [inaudible], which is now not only applying zero trust, but continually reevaluating if your zero trust posture is the correct zero trust posture. So that was a lot of stuff that I went over in the last 30 minutes. So I think we’ll end it here.

Michael Kopp: Well, if you have any questions, feel free to go ahead and shoot us questions at [email protected] and we will respond to them. Once again, I’d like to thank you all for attending and hope you have a great day. Thank you.

Christopher: Thank you.