Containers and Kubernetes adoption are gaining incredible momentum in enterprise organizations. Gartner estimates that 75% of organizations will be running containerized applications by 2022.
However, there are many challenges to moving containerized applications to internet-facing environments while maintaining security:
- Firewalls are necessary but cannot protect Kubernetes pods that keep changing IP addresses
- Security processes are designed for and rely on a zone-based architecture, but Kubernetes doesn’t fit in that architecture
- Kubernetes Network Policies are a new concept for network and security teams to learn, but they are stretched too thin and have no time to invest in learning them
These problems would go away if the security team’s existing tools and processes worked for Kubernetes.
Watch this on-demand webinar and learn how Tigera is the first and only Kubernetes security solution to integrate with the security team’s firewall manager to implement their security controls in Kubernetes. The presentation will include a live demo using Tigera Secure and a popular firewall manager.
Hello, I’m Andy Wright and I work in the marketing team for project Calico and Tigera. I have about 20 years of experience in enterprise IT and that spans a variety of different positions from being a software engineer to working with customers as a pre-sales engineer, and being a product manager, and eventually and currently becoming a marketer. Today what I’m going to talk about is the firewalls.
The first thing that I want to do is give an overview of a zone-based security architectures and firewalls, how they work, why they’re in place and the value that security teams get from them. Because it’s important to understand that context before I go any further to this. The first thing to go over is the network security architecture, and oftentimes this is called a zone-based security architecture. We find that some call it something different but when you use this terminology, it’s generally known by most.
A zone-based security architecture is the existing architecture that’s typically implemented by security teams, so this is something which has been in place for a very, very long time. All of your applications and VMs are probably being managed this way if they touched the internet in any way, and I’ll talk about that in a moment. Those are really for any publicly facing apps that are typically going to fit into this architecture.
It’s also the foundation for the security teams, investments in their time and tools if their processes are wrapped around this architecture, and that’s really where their experience lies today and what’s known. You may need to deploy kubernetes to this architecture, especially if you’re going to have workloads to the public internet users via the public internet. Zones are essentially these sub networks, and I’ll walk through this diagram.
But it starts with what we call the untrusted zone, this is the internet, right? There’s a lot of bad things that are out there that might be trying to get in and you have a firewall between the internet in any of your zones coming in, that firewall is commonly known as a perimeter firewall. That’s kind of the shell around everything that you’re running inside of your data center, that would try to attempt to keep the bad guys out.
Then you have what’s often called the DMZ, or demilitarized zone, and this is where you have your internet facing applications. These are public servers or partner servers and anything that an end-user could connect to that is outside of your network. Then we also have what we call a trusted zone. This is generally where your application workloads are running, that shouldn’t be accepting inbound connections from the internet but they might be connected to via some of your internet facing servers.
Oftentimes then we have a restricted zone, this is your more sensitive workloads. That could be things like your databases or perhaps applications that can write to or read customer data. Sometimes, we also see other types of zones in place as well. Management zone, where you might have some of your management servers and software, your backups that are running. Generally, there’s also some kind of an audit zone. Across these different zones you may need to have some common place that you log information or generate alerts.
This is what we would call a zone-based architecture, I can walk through really how this is used. The first thing is that applications will be profiled for risk. This is all about, if this application is compromised in some way, what’s the worst thing that could happen? Based on that, those applications will be placed into the proper zone. Then from there between those zones, you use a firewall. That you’ll use oftentimes, these are called an inline firewall.
You have your perimeter firewall on the outside, your inline firewalls within the inside, and they’ll monitor and control the flow of traffic. Visibility is important with this architecture, because the important thing that you’re doing here is your specifying which workloads can cross these zones but then you want to monitor any traffic which is crossing the zone. Generally, that’s where the firewall is going to be monitoring. Generally, within the zone that traffic is trusted. If you’re within a DMZ and you’re connecting to something else in the DMZ that’s generally not being passed through a firewall, same thing with a trusted zone or any other zone. Oftentimes, there will be a need to monitor that traffic within a zone. This traffic is generally known as East-West traffic, whereas the traffic going in between zones and out to the internet is known as North-South traffic. East-West traffic can use something called a hair-pinning, is what is generally known as through the firewall.
Where all connections that are moving from one workload to another, some [as that 00:07:23] which is in the restricted zone are all pass through the firewall. One of the challenges with this and why a lot of the other zones are left pretty open, is that these firewalls are generally sized by the gigabits per second throughput. This hair-pinning then becomes very expensive in terms of both latency as well as the cost of your firewall, because high throughput firewalls can oftentimes cost them a million of dollars.
Generally what you’ll see is, hair-pinning would be done only for those high sensitivity workloads. The zone architecture from the higher level, like what does this put in place for? The first thing is to protect the environment. You establish boundaries and then you control the communication between those boundaries or zones. Then what you want to do is detect any indicators of compromise. If a connection which is attempted between zones for example, that shouldn’t be or perhaps a connection that’s occurring that looks a bit anomalous.
This is where things like intrusion and detection systems come in. They’re looking at the signatures of these different network flows and identifying things, mostly traffic which is coming through. Then what you want to do is contain any type of breach or infection of the environment. By splitting this into multiple zones, what you’re able to do is have smaller pockets of applications so that if you do have some advanced persistent threat, it has fewer workloads that could go infect.
The other thing is that zones typically move in one direction. You generally can’t move from, for example, your restricted zone out to the public internet through the DMZ. That would be bad because that could be data being exfiltrated out of your databases. Generally, they’re going to move in one direction so maybe if you can get in, you can’t get out. Now, what I talked about with these firewalls and the zone-based architecture, it sounds very much like this legacy thing that might be running on-prem in your data center.
But it’s important to note that this is implemented in cloud-based architectures as well. For example, this AWS reference architecture that I found through a Google search for AWS reference architectures, clearly show a DMZ as well as other back-end sub-networks that are not connected to the public internet. Same thing for AWS, Azure has these types of zone-based architectures. You see a lot of this with a GCP as well.
It’s not all about on-prem, on-premise certainly something that we see quite a bit but it’s also for these cloud based architectures that it’s put together. The overall benefit to the security team is that, they’re able to limit the exposure and the breadth of the impact. They’re able to group their assets. By grouping assets, what it means is they can actually prioritize their time. There’s thousand workloads to look at.
Well, it’s difficult to prioritize over which ones are really critical to keep your eye on, so they tend to spend more time on those high-risk areas. That visibility provides you the ability to now detect suspicious activity which is happening, so anomaly detection in each one of those firewall zones. Then contain any advanced persistent threat to a single zone, so it can’t really spread and infect the entire environment.
Here’s the thing, is that the zone-based architecture, kubernetes doesn’t really jive very well with that. Kubernetes is generally a wide open flat network and trying to segment it in this way is very, very challenging. I just walked through a little bit of the technical background on why. Kubernetes itself will be running many, many different workloads. Those workloads are scheduled, and they’re deleted and rescheduled, they’re scaled.
As that happens, you use IP address management to assign new IP address to each one of these workloads. There’s a certain cinder block that’s assigned to any kubernetes cluster to enable that to happen. Now you can imagine that those workloads, they’re reusing and recycling these IP addresses. It’s not really predictable upfront what the workload identity is without … You use labels within kubernetes but you can’t really use an IP address to do that, which is really where some of the firewalls are focused.
If you have, for example, a workload that needs to cross through your inline firewall or from one zone to another, and let’s say we have workload A and it wants to communicate with workload B. What you end up having to do with the firewall is; since you can’t specify just the single workload’s IP address, but you end up having to do is open up an IP range. The IP range is generally going to match the same size of the cinder block of that kubernetes cluster.
By doing this, sometimes we call this theatrics. What you’re actually enabling within this environment is; every single workload from one zone to connect to every single workload on the other zone. That just really couches a giant hole in that zone-based architecture. If we look at kubernetes workloads and we try to split this within the zone-based architecture, the first thing is we’re always going to have a perimeter firewall.
Oftentimes, we walk into environments and that’s the only thing that they have, is a perimeter and the rest of the workloads can communicate with one another. But let’s say that you do try to implement a zone-based architecture. The first thing is that you’re going to have these large IP ranges between each one of these firewalls. This could be the case if you’re using a single kubernetes cluster trying to carve it up or whether you’re using multiple clusters, and we’ve seen both in practice.
The next challenge is that oftentimes, some of these modern applications may use of third-party APIs, or they might be making use of cloud-native resources like elastic cache. If you’re running up in AWS or perhaps an RDS database, and you end up having to open up that large IP range outside of the firewall. A great example here is if you have a security group and you’ve got RDS database sitting in that security group, you’re generally going to have to let the entire kubernetes cluster communicate with that RDS database, there is no fine-grain control.
There’s also limited visibility here, because you don’t really see at this point in time any of the East-West traffic that’s happening within kubernetes itself, that visibility isn’t really inherently there. Your firewalls that are in place have no visibility really because they’re just allowing all traffic through, and they don’t have a context for kubernetes. Meaning they don’t know what a namespace is, firewalls don’t know what pod labels are and so it’s just letting everything through. If you do have some type of a threat that hits your DMZ, that can have full lateral access to the entire environment. This is the challenge with the firewall, is that it’s actually not able to implement that zone-based architecture. There are a couple of ways to fix this and we’ve done quite a few webinars on some of these topics. We do around 100 meetups a year, where we help folks understand how to implement this within their kubernetes environments using project Calico, our free open source software and this can be done.
The first is network segmentation using network policy. If you’re not using network policy within your cluster by default, the cluster is wide open, everything can communicate with everything. But you can use network policy to create a DMZ, to create a trusted zone or a restricted zone. These can be different namespaces, there are different ways to implement this. But at that point in time, you can control the flow of traffic, so you can use a default deny for any traffic which is going between these zones.
Then specifically come in and white list any traffic that should be allowed. You’ve effectively implemented a zone-based architecture at this point. Also, if you need to have compute isolation, you’re able to use a node selector for these workloads based on the zone to assign pods to specific nodes. This is going to ensure that if you have a pod that’s supposed to run within the DMZ zone, it’s also only running on the DMZ servers. Now, this does achieve the network isolation, workload isolation which was the goal of the zone-based architecture.
But one of the challenges here is that, this is a completely new model. We’re talking about network policies, we’re not talking about firewall rules at this point. This is just something very new to learn. The other challenge here is one of the things that we lose in this environment is that visibility into the traffic flows. If something does flow from one zone to the next, do we have visibility into which flows those are and have some form of a log of that so that we can manage that, and detect anything strange that shouldn’t be happening.
You would say, “Okay well, this is great. We can fit inside of that architecture. The security team should learn this, right?” One of the challenges that we hear though is that, there’s this rapid growth of containers, everybody’s running it. I think I see almost every single week that Gartner quote that by 2020, 70% of organizations will be running containers. But what Gartner doesn’t say is how many containers they’re going to be running and how much kubernetes clusters are going to have.
But if you go talk to some of these large companies, the security teams will say, “Yeah, that’s like less than 1% of what I got to manage.” That’s the first thing, why would you go learn something new and reinvent your model if it’s just such a small percentage of what’s happening here? You’ve got to like a lot of friction there. The other challenge here and oftentimes we see this, this is why we see the perimeter firewall in place and nothing else put together, is that most security teams know about containers, right? That’s a workload, it can be assigned an IP address and the container can be relatively static workload. We see that oftentimes as the first step to moving to containers.
But container orchestration is generally something that’s very, very new. We can stand out at Black Hat for example, and talk to people coming up to our booth. One of the first questions is, are you familiar with kubernetes? I always get a head shake.
It’s just not very well known, so the concept of orchestration is very new to these teams so they may not even know that it’s a problem in many cases. We do know that the teams are using this zone-based architecture, kubernetes doesn’t quite fit into it and the security teams don’t really have time or resources to learn these cloud-native concepts. You’re likely not going to get them to be developing YAML files for your kubernetes workloads and segmenting them out in that way.
They’ve got a lot going on with the other 99.7% of their environment. This is a challenge that, you can meet the current challenge to segment out your network and to implement the zone-based architecture, and you get that security, right? It’s just not aligning with the security team’s model. If you need to go that next step to align to the security team and the tools that they use, there’s another approach that you can take.
The goal here is to enable the security team to deploy kubernetes to their zone-based architecture and get the same visibility, to get the same fine-grain access controls and to use the tools that they’re already using in node today. Like this would be the vision and the easiest way to work with that team, in the early days of them adopting these cloud-native workloads. What I’m going to do today is a high level demo of that being implemented through a firewall manager called Panorama, which is from Palo Alto Networks.
I’m actually going to go ahead and launch my screen, and I’ll bring you guys through what Panorama is here. Let me share my screen, okay. What I’m showing here is a Panorama. A Panorama is a firewall manager and generally, if you’re going to have dozens or more firewalls that are running within your state, you really don’t want to log to each firewall to manage the firewall rules. You’re going to want to have some centralized platform that manages all of your firewalls. That’s what Palo Alto Panorama is.
Now we have policies that are listed here, and these policies here are for different environments that we have. We have a DMZ, we have a restricted zone and we have a trusted zone. These policies, what we can see is, for example, DMZ source would show that anything that’s coming from the DMZ, whether that’s any address user or profile and the destination is restricted or trusted, we should deny that traffic. The same thing would go for the other zones. It’s going to by default, it will deny any traffic which is crossing those zones. Of course, my mouse moving in that direction doesn’t work so well. For some of the others down here, like intra zone within the DMZ, restricted and trusted, will allow that traffic. This is just kind of a basic high level set up of some firewall rules. The thing called device group, what this is, is a hierarchy. These are the rules and then I have several different, generally firewalls that would inherit those rules. Then you can add specific rules to each one of those firewalls. Now in this case what I’ve done and if I select, these are different kubernetes clusters that we have. If I select our AWS cluster, these are all in yellow because these are zones which had been inherited from the one above it. Just a quick overview of Panorama. What I’m going to do is I’m going to show you Tigera Secure. I’m not going to walk through a demo of this product, but what I’m going to use it for is our policy board. Our policy board is more visual way to describe policies without me having to walk through YAML files on a command line. I’m going to try to stay out the command line except for one little command here today. What we have here is a policy board and this shows all the policies which are running inside of my kubernetes environment. One of the things that’s unique to our policy board is we have this concept of tiering. Meaning, any traffic must fully pass through all of these policies before they get evaluated by any of the other policies.
This is a way that we set up guardrails to enable our security team, for example, to create firewall rules and not have developers or applications, like write a conflicting policy. I’m going to launch only one command today and it’s just going to launch a kubernetes deployment that we have. What that deployment is, are firewall integration. The way that that deployment works, is it’s going to set, it will run in the background and it will connect to Panorama. It will read the firewall rules and then it will apply those into your kubernetes environment as network policies. It’ll actually translate the firewall rules into kubernetes network policies. Tigera will pick up on any new policies that are added to the environment, so that might just take a second for this to run. But what we’re going to want to see is that these DMZ restricted, trusted firewall rules and how we’re limiting access between these environments to appear up here as kubernetes network policies. Just give this another second to run, not exactly a real-time sync.
Okay. A tier of network policies that have been created here, and it says our Panorama firewall tier of policies. The first thing you’ll note is we seem to have fewer policies here, there’s a whole truckload of them over here within Panorama, but we only have three that have been applied to the kubernetes environment. This is because of the power of network policies. Also, what we’re doing in the background is to have some intelligence to make sure that we’re being efficient, we’re not creating some policy sprawl.
The reason why I wanted to use the gooey today, is that it’s much easier to look at a policy in a gooey like this. This is our DMZ policy and what I can see is that it’s denies any traffic that’s coming egress … Any protocol that’s moving from zone equals DMZ. This is basically enabling all of your zone is known traffic. The same thing with your egress rules. It’s going to deny anything going out to restrict it or trusted, but it’s also going to allow anything within zone. This is going to a whole bunch of different rules around the BMZ, where can connect and it’s creating one single policy. Now, a couple of different things that I can show here as well is, let’s say that new policy gets created in Panorama on my AWS cluster and I can add some sort of a policy here. I will call this my demo policy. Let’s say this is in the DMZ and it’s moving to our trusted zone. I’ll just grab some arbitrary service that’s up here. What I want to do here is allow that traffic.
Okay, so I’ve created a new policy that’s going from the DMZ to the trusted zone, out to commit the policy. So much like Tigera Secure, there is policy life cycle management so you can stage a policy before you’re ready to commit it. Maybe perhaps, I’ll add even an additional one. Let’s say I add a new zone up here called demo zone, and commit that. What should happen now is, process running in the back-end will read these changes that have been committed to Panorama and then it will translate those into the network policies.
Now, the first one that I created is basically allowing access for one particular service, so that’s going to go in and update one of our existing policies, our DMZ policy. I also created a new zone. In that case, what I should see is a policy for a new zone being created up here as well. This can take just a moment for this thing to occur.
Okay, you’d see that there’s a new demo zone that’s been created. This is for any zone called demo zone, and it’s denying any protocol to DMZ.
That was very basic, quick zone that I created. It also would’ve updated here the egress to allow this particular service to communicate. In that way, your security team are able to control the zones but you still have application teams who can deploy their application within the zones. But let’s say that you have somebody who comes in through the center face and decides to add their own policy outside of the process of using Panorama. I’m just going to call this a bad policy.
I’m not really going to set any specific rules to it, but I’ll go ahead and apply that. Now, this would be a bit of a problem because we’re coming in and we’re overriding the rules that have been set within Panorama. Now within Panorama, there’s not really a visibility. You don’t really want to go register this as a new rule in Panorama, so there’s some way that you have to mitigate this situation. What will happen with this service is that, we’ll actually go in and just go delete that bad policy automatically, you’ll notice that it’s not there.
Now that action gets logged within the system. There’s an audit log, but that’s going to ensure that folks don’t really go around the security team. Again, this thing takes about 30 seconds between each change to sync. Firewall rule changes don’t really happen that often. In fact, typically when you ask for one, it might take three or four weeks. There you go, the policy’s automatically been removed. I’m going to pause there, I just wanted to go over a high level overview.
There are additional things that we can do here in terms of monitoring the traffic flows. We can integrate this into the SIM tool, which is another tool that the security team will use to identify any anomalous traffic and indicators of compromise that they know, to go in and investigate something that’s happening within their environment. That’s a bit outside of the scope of today’s session. I’m going to stop sharing here, and I will open this up for any questions.
The first question that I have up here is, can we compare the applications access through a gateway? Which says, “I can’t reach the services directly without valid credentials. Can that be equated to hiding them behind the firewall to restricted and trusted zone access?” In this case, if I’m understanding this correctly, it sounds like just restricting that access to a particular workload. The challenges is, let’s say, do have valid credentials to get into that workload. rom there, if you’re not setting up some kind of a zone-based architecture, if you’re not using network policy in doing segmentation, someone could legitimately figure out how to access that service. They would have at that point in time a lateral movement and access throughout the rest of the cluster, if it hasn’t been segmented. By default, that’s generally how it’s done, the clusters left wide open.
I don’t think that you can really equate credentialed access to a workload, to using a zone-based architecture or a firewall. Then another question up here is, one can monitor for security without deploying a firewall, which is true. Lots of vendors have sensors for this purpose that are not firewalls, and so it seems like we’re conflating monitoring and restriction.
That’s absolutely true, it’s one of the things upfront that I was trying to say is that, while I’m talking about this thing is firewalls, that there are many other constructs that create what we call the zone-based architecture. You have things like security groups and network security groups, other ways that you can segment out an environment using VLANs and other tools. But effectively, it’s the same outcomes that the security team is looking for.
Regardless of how that architecture is, kubernetes does tend to have challenges in it’s out of the box state, sitting within those environments.