Microservices Security Summercamp – Session 3


Microservices Security Summer Camp – Session 3: How to Evaluate Microservices Security Solutions

When evaluating a security solution for modern applications, you’ll need a checklist based on the best practices. Many checklist items are standard and you can adopt those from this webcast. Others may be specific to your environment and compliance requirements and we will be covering those in the Q&A.

Erica: Hello everyone. Thank you for joining Microservices Security Summer Camp. This is our third session, how to evaluate microservices security solutions. This is a three-part series. This is our third and last one, but all the recordings will be available in 24 hours. Our speaker today, if you’re joining for the first time, is Christopher Liljenstolpe. He’s the original architect behind Tigera’s project Calico. He actually speaks at over 60 meetups per year, so you might have already met him and consults Tigera’s enterprise clients on security and compliance for their modern applications, and fun fact, he was actually a park ranger. So I’ll pass it over to him.

Christopher: All right. Thank you, Erica. So I’m going to assume that folks have watched either live or recorded the first couple of installments we had here and we talked about some of the security challenges that come with modern application design and deployment and then some of the techniques you can use to address those challenges and the way things have changed from I call heritage architecture to modern architecture from a security and networking standpoint. Today, what I’m going to talk a little bit about is how you can evaluate the solutions you’re looking at in this space and how do those line up with the challenges and the possible solution set to make this all work. So that’s what we’ll be talking about today. So having a little bit of an understanding, if you’re watching this on video, not live, maybe go back and watch at least the second, if not both episodes.

Christopher: So with that said though, let’s walk. The way I’ve got this structured today is I’m going to take each of those areas where there’s opportunity or different security approaches or different ways of thinking about security in this modern environment. I’m now going to give you a couple of bullets for things to think about as you’re evaluating tools that address those issues. That’s why we’re going to do this. Please feel free to ask questions as we go. Erica will be watching for questions. Questions make it more interesting. So feel free to speak up.

Christopher: So the first one, we talked a little bit about his code provenance and the idea here is that the code that you’re deploying in your infrastructure can now come from a much wider variety of sources and is much more dynamic. So not only is code coming from your vendors or internal teams, there’s lots of open source, lot of projects. There’s orchestration systems like Kubernetes and incorporate lots of open source projects as well and it’s an open source project. Lots of developers will, even if they’re writing custom code, will borrow lots of modules, even maybe lots of containers and pods that are already available out in the infrastructure, out in the community, be it Docker Hub, Quay, et cetera.

Christopher: So you’ve got a much broader provenance and people think about that as that you might get software that’s intentionally or unintentionally compromised. You also need to think about licenses because if you have a policy that says you will only want to use, say Apache or MIT or BSD3 Clause licenses, and for whatever reason you don’t want to include some other license, say Oracle or, or GPL or something along those lines, you need to make sure that you’re actually scanning for that probably automatically to make sure that they don’t have a variation to your license policy.

Christopher: So one of the things you need to think about here is when you’re looking at scanning your components … Another thing I want to talk about, sorry, hop back a little bit, talking about application code here, but as you also talked about the infrastructure is now managed as code as well and the infrastructure very well may be code, be it open source projects or like STO or commercial offerings or open source offerings, something like Tigera. So infrastructure itself is now code and has the same concerns. So you probably want to scanning and you can be doing this and your repo like a Quay or Docker Hub or you’re on private repos. You’d be doing this as part of your CI/CD chain, et cetera but this is probably something you want to do and it’s not just a scan once, it’s a scan, it’s a continual scan. So you’re going to scan for CVE, you’re going to scan for licenses. You may do static code analysis if you’re really into that as well.

Christopher: One of the things that’s possibly useful for a scanning tool would be the ability for it to label artifacts based on those results. Labels and metadata drive everything in this infrastructure and everything in this ecosystem. So Kubernetes makes lots of decisions based on metadata labels, so does Docker, so does … Pretty much everything out there. Tigera’s solutions make a lot of policy (inaudible 00:05:36), policy decisions based on the labels attached to endpoints or artifacts. So hat making the scan can potentially highlight things, but having the scan also be able to attach tags or labels to workloads that are meaningful to your orchestration system, to your network policy or other security policy enforcement entities is a good thing because then that mark is made and that mark will follow that artifact throughout its life cycle in your platform.

Christopher: You might also want to link this to CI/CD, ie., check piece of code in, it gets scanned and if it fails scan it gets bounced out to a review process or a merged call out. So that it doesn’t automatically go into deployment. So those are some things that you might want to think about when you’re talking about code provenance. Another one, and this is sort of related, automated deployment or everything as code. Pretty much as I just said and we’ve talked about before is everything is now an artifact. Everything is now a piece of code. That code might be a configuration, it might be actual executable, compilable code, et cetera.

Christopher: In this model where things are changing very dynamically, where things are changing minute by minute and you have a huge, potentially huge fleet of components out there, containers, pods, servers, et cetera. The days of having people log into those individual components and fix issues is over and it’s over because, fine, I went and fixed that pod a that had a bug and that’s great. But Kubernetes later decided to kill that pot off and start up for more for a scaling or a migration off of a server os being taken out of service. You’ll have a regression. You go back because the fixture is made in a live component, not in the underlying code. The way you should be doing these things is whenever you need to make changes, be it configuration, be it code, et cetera, you should make the changes back in the repository and then push the change through the system rather than changing live.

Christopher: So any solution for your deployment mechanisms, et cetera, should support an immutability model. It should encourage people to edit the artifact, not … Or the actual source rather than the instance and it should play very well with your source code control system in CI/CD chain. So all of your deployment process should follow this process and immutability is the only way you’re going to have a manageable infrastructure. What anchors are being used in your modern application security environment? What is the thing that you’re using for identity or to anchor policy or controls to? Secondary artifacts. Artifacts that exist because you decided to deploy something or your orchestrators decided to deploy something probably aren’t good anchors to tie your policies and other controls to, because those are very dynamic and those are very fungible.

Christopher: Workload Bob might have address now and in five minutes from now, it’s workload of type Alice that has, So anchor things, IP addresses or which interface, virtual interface on a server or what, even what server it’s on is really a fool’s errand here. So what you really want to do is anything that’s touching, and this isn’t just for security folks, this is for every resource you might need to attach to a workload, to an application component, needs to be tied to metadata or some other identity that’s assigned by and used by either the CI/CD system, the orchestration system, et cetera. It’s the things that are going to stay with that bit of artifact throughout the entire life cycle, no matter having to kind of span up, span down, scale up, scale down, move, et cetera, you’re tying to the actual artifact rather than a given instance.

Christopher: So these things like metadata labels, fingerprints, namespaces, service accounts, commit hashes. Those things are what’s solid and real in these new environments, the things like what hosts this thing’s running on or which IP address is ephemeral and not the right thing for your solution to tie security controls to, or any other kind of resource to for that matter.

Christopher: Blast radius control. We’re going to assume that you will have explosions in your infrastructure, in a dynamic infrastructure that has a number of endpoints, going back to the concept of zero trust. We’re going to assume that intentionally or unintentionally you will have mal actors in your infrastructure. It could be piece of infrastructure. It could be code, et cetera. But we’re going to assume that there will never be a state where you were always clean in your infrastructure. There will be, you should make the assumption that there are bad actors within your environment.

Christopher: So in order to control this, you need to layer policy and defense. You need to have tiered policy. Say somebody, the blast might be the developer wrote a policy that unintentionally allowed, would allow say HIPAA data to a mix with non-HIPAA certified workloads or excellent cut off your (SOX 00:12:22) compliance on logging infrastructure. What you need is multiple tiers of policy so that it, say a different tier, at a higher priority tier, the policies are not as dynamic because they’re more conceptual and a compliance team can say things that have or managing HIPAA data can’t talk to non-HIPAA certified workloads or everything must be able to communicate to the (SOX 00:12:52) auditing infrastructure. Those policies change less frequently and they’re much more philosophical if you want to think about it broader scope.

Christopher: So those policies can still take precedence and override, say a transient mistake that the developer makes it a lower level because he’s changing policies very frequently because this application graph is changing frequently. Every time somebody makes a change, that leaves them the opportunity for a security event to occur. So in this case, what we’re doing is saying, by having tiered policies, you can have enforcement of that concept at multiple points in the policy chain such that multiple points would have to be broken before you actually had a violation. Similarly, micro policies segmentation, if you make lots of … If you make very complex policies, it becomes very hard to figure out what their actual impact are or what their impact will be, what they’re actually going to do. You take those policies and instead say, (LDAP 00:13:59) that clients can talk to (LDAP 00:14:01) servers or things that-

Christopher: Clients can talk to LDAP servers, or things that deal with trading need to talk to the SOCKS logging server on whatever you’re using for auditing (inaudible 00:14:12). Those are very simple things, easy to understand, and put in the right priority scheme will allow you to have policy enforcement and compliance. It will be easy to figure out, and you can make changes then to say just the SOCKS policy, and not have … If policy comes down and says you’re going to change the way you handle SOCKS auditing, instead of going to a bunch of different policies that all have a SOCKS component to it, you just go to the SOCKS policy and say here’s the change. It is reflected across everything that that policy’s attached to. Micropolicies, and microsegmentation.

Christopher: If you have a segmentation, everything in app A can talk everything else (inaudible 00:14:54) in app A, and everything in app B can talk to everything else in app B, that’s great, except your blast radius is now app A. So, what you may want to do is have finer-grained segmentation that says even within app A, only front ends can talk to back ends, and back ends can talk to databases. The back end that handling actual trading transactions has to talk to the SOCKS log. That might be a finer-grained microsegmentation.

Christopher: This also allows you to handle east-west traffic, which we’re going to talk about in a minute. You will be able, if you have microsegmentation, you’ll be able to allow something in A to talk to something in B, because as we go down this microservices road, more traffic will be going east-west. They will be crossing what used to be fairly coarse boundaries.

Christopher: And, it’s got to be arbitrarily fine granulation. You may decide today that your only policies are going to be DEV-TEST-PROD, but over time you will probably need to make those finer, or you may say for this application, all I need is DEV-TEST-PROD. For another application, in need very fine-grained application, perhaps.

Christopher: You need to make sure your platform can handle everything from very coarse granulation, very coarse segmentation and policies, down to very fine-grained segmentation and polices. You don’t want an arbitrary restriction there.

Christopher: You need to enforce these things, these controls, at multiple points and at multiple layers. Again, if I have an untrusted piece of infrastructure and that is compromised, and I’m only enforcing at that point, I’ve already lost the battle. It may be that if I’m only enforcing at one layer, somebody might sneak through.

Christopher: So let’s say I write a policy that allows certain workloads to talk to other workloads over HTTPS, and then those same workloads can talk to, sources can talk to the desks, but only doing GETS to CUST RECORD in HTTP CUST RECORD URIs, and on top of that, the source (inaudible 00:17:10) can only communicate with MTLS certificates. I now have enforcement at multiple layers of the infrastructure. Even if a layer control fails, I still am policy-compliant. More importantly, all those layers have different signals. There’s different things I’m matching on, there’s different things that you’re looking at.

Christopher: So, I could now build a much more holistic picture of the components that are communicating, and decide if that communication should happen, rather than looking at a single thing, i.e., the IP SOURCE DEST address port. That’s important, but if I can layer more layers of signal on there, i.e., what kind of HTTP transactions it’s doing, does it have the right TLS certs, that gives me much more signal to make a decision, a holistic decision, if I should allow this traffic or not.

Christopher: All these controls need to be tied back to RBAC. You need to make sure that the people changing these controls, or initiating these controls are the people you expect them to be. Both to make sure that somebody doesn’t get in and make changes that are inimical, but you also need that for auditing and compliance. Who’s made changes, when have they made changes. It also allows you to come back and ask them why changes were made. Maybe policy’s incorrect, the person has to keep working on a certain policy. Maybe that allows you to optimize things. A little bit off blast radius control, but RBAC-tied controls are important, here.

Christopher: Assumption of absolute trust in any given resource will have predictable results. This goes back to the multiple points and multiple layers. If I assume I’m going to have mal-actors in my infrastructure, then I have to assume that if I give absolute trust to any single component, I will be punished for that assumption.

Christopher: East-west. We started off this journey 30 years ago, most traffic came into an application from the user and went back to the user at the end, and there was very little network traffic, other than up and down through that given application stack. That, we call north-south traffic. East-west traffic is communication between components, potentially different applications.

Christopher: Today, those numbers … People always talk about 10% east-west, 90% north-south, that number’s pretty much flipped on its head. A while ago it was 50% east-west, 50% north-south. Now one can say it’s probably 90% or more east-west and 10% north-south, as web has become the presentation mechanism, and applications have become much more interdependent (inaudible 00:19:56) application components.

Christopher: You can also think of east-west as intra-cluster or inter-datacenter traffic, or (inaudible 00:20:02) datacenter traffic. Today, in the heritage model of doing things, the way you control traffic east-west is you set it through firewalls that reside within the infrastructure managing that east-west traffic. That means that that middle box, or that firewall, is a choke point, which means it’s just that. It’s a choke point. Basically, the more traffic that’s going east-west, I need to stand up more and more of those, I need to properly load balance between them, et cetera, et cetera. But, at some point they will always be a choke point.

Christopher: If 90% of my traffic is east-west, and in a given cluster I have, say, a thousand servers and those thousand servers each have a, say, 50 gig interface, all of a sudden you’ve got 50,000 gig worth of traffic, potentially going east-west. You can go do the math on how many middle box firewalls you’ll need to handle that, and they’ll need to be put everywhere where those interconnections happen. So, this becomes a problem.

Christopher: In order to … So what you should be thinking about instead of following that model is that enforcement of all traffic, especially east-west but if you’re going to do east-west, you might as well (inaudible 00:21:25), needs to be at the edge. By the edge, I mean the edge of your perimeter and more importantly, the edge of your workloads. The enforcement needs to be right in front of each of the workloads for any security solution you’re doing, such that it scales with the number of workloads and the number of hosts and everything else you deploy. You need to move your enforcement to the edge, rather than the core. Becomes a smart edge, dumb core infrastructure. That’s the way service providers built networks for a long time for scale, it’s what we need to do now, here. Smart edge, dumb core.

Christopher: Rate of change. If a solution that you’re considering has some single entity that’s going to compute the complete set of flow graphs, or security relationships, at cloud rates of churn where things are changing on a minute-by-minute or second-by-second basis, you will also have predictable results. There is nothing today that is going … where somebody can say, at scale, a single centralized controller can have absolute view with certainty of everything that’s in the infrastructure. When you’re talking about thousands and thousands of servers, and hundreds of thousands of workloads, you will never have a specific, real-time, complete calculated view of what should be happening. This needs to be distributed out to the edge. Again, smart edge, dumb core.

Christopher: Assumptions that infrastructure will ever converge also will have predictable results. If your solution you’re looking at assumes that at some point there will be a point where policy and everything else will be exactly the same everywhere, it is a fallacious assumption. You can’t make that assumption.

Christopher: We had a meeting a long time ago with a user of Project Calico at the time, and they asked if they could have a pice of software that would tell them that their cluster of … let’s just call it multiple five-figures worth of servers, at a given point in time could they be assured that the same policy was pushed everywhere, sort of at a point in time yes, policy is converged.

Christopher: When you look at that from a centralized viewpoint, being able to achieve that, and code’s really easy to write, it’s basically return “null.” You’ll never get that, because by the time you converge everywhere, or everyone’s reported back, something will have changed on one of those hosts. One of those hosts might have gone down, a container will have changed, et cetera.

Christopher: So, you have to assume that everything is continually dynamic, which means … The corollary of this is that to deal with rte of change, you need an intent model, a high-level model that can be rendered at the edge rather than in the core. Bobs can talk to Alices, and then everything that has a Bob or an Alice can make their appropriate decisions about what flow should happen. Rather than in the central building that full map of all the Bobs and all the Alices, and telling everything exactly what to do, you send the intent. Because the way Kubernetes and all these other components work, you send intent down to the edge, and say this is what things should look like.

Christopher: And two, you have to assume asynchronicity. Asynchronicity may be in the period of milliseconds, five, ten milliseconds, to get a change replicated throughout the infrastructure. But, that’s still asynchronous, rather than everyone hitting at exactly the same time. If you’re solutions require synchronicity, where everything updates at exactly the same tick on the clock, then you’re, again, probably going to have a bad day at scale. So you have to assume that everything is working toward eventual convergence, but will never actually quite get there, because things are just changing. In a scope this big, things are just changing too much to ever actually achieve complete convergence. You still have security, but you will not have convergence at any single point in time.

Christopher: Infrastructure hardening. If it’s hard to understand, it’ll be hard to operate. Complex to operate and complex to secure, which goes against infrastructure hardening. So, first of all, the solutions got to be simple, and it has to be something that anyone who understands that underlying platform can understand how it works.

Christopher: Blocking CLI access to everything is probably a good idea, going back to the mutability concept. Infrastructure security failures very often come to somebody trying to fix a problem and this … oh, I’ll just make this change for a minute or two, and that’s all it takes. If you want a class on that, go open up a host on Amazon or whatever, with default logins or even simple-to-guess logins, and see how many minutes it takes for somebody to find that host and own it. Or any cloud (inaudible 00:27:05), not Amazon-specific.

Christopher: So, blocking CLI access, though, is a generally good thing. That way, my CICD can help me make sure that the infrastructure hardening rules I put in place are still there. You always, maybe, need a break glass here, but that shouldn’t be your go-to for dealing with issues. You want to consider things that will support isolated management networks. Not everyone’s going to use them, but you might want to put your control plane on a network separate from production, that’s a useful thing.

Christopher: What’s really important is, when you have an issue, not if, you’ll know what happened. So you need support for accurate logging, trustworthy logging, which means I need to have immutable logs, non-repeatable logs. We’ll talk about that a little bit in a minute. And, everyone has a common concept of time. Time is really important across a fleet of thousands of servers and hundreds of thousands of workloads to figure out what happened. If you’ve got even a couple-

Christopher: Balance of workloads to figure out what happened. If you’ve got even a couple seconds off, in clock ticks between different servers, or even a second off, they will become much harder to figure out what actually happened in the infrastructure.

Christopher: Whatever you’re doing should play well with other layers of control like mandatory access controls like SELinux where even root is not root. You can say that very specific groups of people are allowed to control this. So even if I login with (inaudible 00:28:32) on a host, I still can’t make changes to say, memory protection or network policies. I need to have special credentials to do that.

Christopher: In tech models, we talked about this little bit earlier. Imperative models scale are interesting. Interesting in a not good way. Instead of telling the system what to do, you should express your intent to the system, and let the system render it. What you should say is Bob can send LDAP traffic to Alice, not this IP address which has a very short lifetime, allows LDAP traffic from another IP address. That’s more of an imperative model. It’s more of an intent. Things that are type Bob should be able to send LDAP traffic to things that are type Alice.

Christopher: Similarly, conceptually, EU backends should only connect to EU databases, but not US databases. Not again some kind of networking construct. The reason for this is a couple-fold. One, today EU backends and EU databases are on potentially different VLANs. Tomorrow they may be in an entirely different mechanism for doing that, or there might be 10 VLANs for EU backends. And now you gotta go make those policy changes wherever you used a secondary identity or an imperative statement to enforce something. So now you need to remember where all of those rules about EU backends and make those changes everywhere, versus in one place. Say EU backends are identified by these labels.

Christopher: Secondly, this becomes even more interesting as we go to a point where you have potentially changes in underlying infrastructure. Your multi-cloud or whatever else. If your policy is EU backend should be able to connect to EU databases, well on one cloud maybe you’re using VLANs to identify those. On another cloud, maybe you’re using labels, Kubernetes to identify those things. If I’ve written my policies or if I’ve written my security concepts around the realities of a specific cloud, guess what, I’m going to have to rewrite those entirely to match the other cloud’s model. And good luck in making sure the policies always stay in sync for that.

Christopher: Whereas in tech model, you can just say EU backends should be able to connect to EU databases, and on one cloud that gets rendered as VLAN rules and on another cloud, it gets rendered as network policy in Kubernetes. So you really want to think about pair intent and not a centralized drive of the actual underlying infrastructure or applications. Let the layer of (inaudible 00:31:33) render those as appropriate and as distributed as can be.

Christopher: Micro-segmentation and micro-policies. I think we talked about this a little bit. It may be a little bit of a duplicate. Large complex policies incorporate multiple concepts. That makes them interesting. The same concepts end up in multiple policies. I’ll guarantee you that you will miss one of those. So let’s say you got polices for applications where I would invent roles or concepts. So you say front-end (inaudible 00:32:07) application can talk to LDAP servers on 636 or 631, and can do a bunch of other things. And application B front-end can talk to LDAP servers on 631 or 636 because you allow LDAP on 631 or 636. You do that for each of your applications, so you have this every application has its big complex policy of which one component is LDAP. Now (inaudible 00:32:34) says we’re not gonna allow non-PLS LDAP, so you need to drop the 631. I’ll guarantee you, you will miss one of those policies. Whereas instead if you have, micro-policy that’s says, now things that are front-end can talk to LDAP servers on 631 or 636 and have that application named front-ends will inherit that policy. Application B front-ends will inherit that policy, et cetera, et cetera.

Christopher: Then I have just one place to change, the behavior of LDAP. Now you’re going to that one place, remove 631 as a port. And that policy then gets rendered everywhere. Rather than making the changes at each point. Core segmentation will increase your blast radius. If I make big application A, anything in application A can talk to anything else application. Application A, my blast radius, is application A. Worst case of course, segmentation, everything inside my perimeter is trusted and can talk to one another and everything else on my perimeter is considered bad. If you’re a large enough company, I will guarantee you that behavior will end you up in probably in front of a congressional hearing at some point or your CEO anyway. As can be seen by various companies that do credit reporting and other things.

Christopher: You need to contain the inevitable incident. So you need to do that by scoping down as much as possible, what can talk to what. Policies, and I sort of touched on this already, policies should be weighed to specific concepts and flows and relationships. And the labels that you attach them to also probably need to be related to roles, capabilities, personalities. So maybe what you say is workloads need to be able to talk to LDAP servers, should have a label of LDAP producer, or LDAP consumer. It’s consuming LDAP resources. Not LDAP servers have a label that says LDAP producer. And then your policy is anything labeled LDAP consumer can send traffic to things labeled LDAP producer or client server. And that case, you don’t even just have a policy for front-ends, another one for backends, (inaudible 00:34:48) talk to LDAP. Then you just attach the LDAP consumer label to everything that needs to be able to talk to LDAP. So that’s the way you should be thinking. That’s the way your security policy should allow you to do.

Christopher: End-to-End encryption is here folks. That’s a good thing. (inaudible 00:35:08) those security solutions shouldn’t introduce insecurity. Lots of folks out there still want to do DPI and then encryptions world, so they do a insert advice that will de-capsulate, de-encrypt the traffic, inspect it, then re-encrypt it. In security terms we call that a man in the middle attack. At that point, you have put something in chain that is specifically going to break the security you have by breaking the encryption in midstream. It also is a wonderful place to attack. In fact there was a set of slides that went around after a certain Mr. Snowden left government (inaudible 00:35:49) and one of those slides has a happy face pointing at it (inaudible 00:35:54). Man in the middle attack on encryption.

Christopher: So first of all, you made things less secure. Secondly, it’s a wonderful attack. All the keys necessary to decrypt all your traffic flowing through that middle box are in one place. What could possibly go wrong with this model. So you probably want to do is the ability to inspect it before or after encryption rather than breaking encryption in midstream. Since you’re encrypting at the edge, what this means is your solutions needs to support both, i.e very distributed inspection. It looks like my slide editing dropped something here. So basically what I’m saying is if encrypting at the edge, and I don’t want to break encryption in the middle, that means I also need to inspect at the edge. I.e I need a very distributed inspection capability. This is where things like service routing, et cetera, come to play. What you want to do is encrypt right at the edge, right at the trust boundary of that given microservice even.

Christopher: But before that happens actually do your inspection. So encrypting at the edge, inspecting at the edge, central boxes, one of two things. (inaudible 00:37:07) that’s another 443 package. That’s another HDPS 2.0 package. Or you’re gonna get a security violation at some point. So inspect at the edge and then encrypt. Or the other end, decrypt at the edge, and then inspect.

Christopher: RBAC, compliance requires audit. You can’t audit if you don’t have AAA. So you need to have a AAA system here. The last thing you want is another YAAAAS. Everyone has multiple YAAAASs and you don’t need another one. So whatever you’re doing for RBAC, for your security solutions needs to tie in to your auditing, authentication, authorization system that you already have.

Christopher: So hopefully everything in your ecosystem that you’re using to deploy will do the same thing, will use the same thing. Days of having reader operator owner as the levels of groups is way way too coarse. Different groups have different responsibilities, different areas. So your system needs to support multiple groups. Arbitrary numbers of groups, the ability to belong to multiple groups at the same time for any entity. And then those groups can be assigned to roles in your security system. So you need to look for a security system that will use role based RBAC controls that allow potentially reader, operator, editor, owner, et cetera within that role. But having just reader, operator, owner flags on users is just way too coarse.

Christopher: The other thing to think about is it’s just not AAA for users. It’s also for applications, for robots, et cetera. You need to AAA everything because most things, these environments, they are going to be done by non-human actors. If the log isn’t immutable, and it’s non-reputable, it is a log. It’s a fallacy, it’s a piece of fiction. So you need to make sure your logs are immutable and non-reputable or you have a way of assuring that.

Christopher: If the log doesn’t account for automated operations as well as human error operations, it also is not a log that is useful in these environments like I said. The logs have to be useful though as well. And again I keep on harping this because this is an important thing. If the logs are tied to things like IP addresses, or things that are ephemeral, the log is useful only at the point in time when it gets launched because five minutes later the IP address will be assigned to something else. So at that point the log, any looking back, retrospection in your logs is useless unless you have another log that lists what that ephemeral endpoint key that’s in the log was assigned to at that point in time. And then you get to combine the two, and hopefully your clocks all work.

Christopher: So better if the logs are tied to meaningful metadata coming out of the orchestrator that’s driving all of this stuff. So again labels, namespaces, hashes of check-ins, et cetera. Things that will allow you to when you retrospect in the log know exactly what it is that you’re looking for. You also want to make sure that your security system does have its own logging environment. YALS, yet another logging system, is about as fun as YAAAAS is. So you wanna make sure that you are able to send those logs to whatever you’re using as a larger logging environment.

Christopher: There will also be lots of logs. Unless you can search constraint or group when you’re actually looking for things in the logs, it will all just be bits on lots of platters and pieces of plaster completely useless. So whatever you’re using for logging, needs to be able to search, constrain, group logs, you can look at the subsets of the logs and correlate them. Similar to that means that whatever you’re sending, your security solution that’s sending out those logs needs to have the necessary bits of data in there to allow you to search constrain and group again going back to that metadata, meaningful metadata.

Christopher: Anyway, thank you very much for attending. As Erica has said, the recordings of this will be up and available on 24 hours on the Tigera website, Tigera.io/resource. Alright thank you very much folks, and have a good rest of your week.