Istio Traffic Management – Best Practices in Secure Kubernetes Environments


Istio’s traffic management decouples traffic flow and infrastructure scaling allowing you to specify what rules to govern traffic rather than which specific pods should receive traffic. In this webinar, we’ll discuss the following traffic management topics: Discovery Load Balancing, Failure Handling, and Fault Injection.

Michael: Hello, everyone and welcome to today’s webinar, Istio Traffic Management: Best Practices in Secure Kubernetes Environments. I am pleased to announce today’s speaker, which is Christopher Liljenstolpe. He is the original architect behind Tigera’s Project Calico. He speaks at over 60 meetups a year, educating people on networking and network security for modern applications. Without further ado, let me hand it over to Christopher.

Christopher: Alright. Thank you, Michael. If we take a look at the definition of what Istio is, Istio is a platform that delivers a concept called a service mesh, and there are other service mesh platforms that are out there. Istio, however, seems to have the most currency in the Kubernetes environment, so that’s the one we’re going to be focusing on here. It’s actually the platform that we’ve decided to integrate with here at Tigera for both Project Calico, our open source, as well as our commercial offerings.

But what is a service mesh? You’ll hear people talking about or asking that question. I ask that question at meetups, etc and people are deploying service meshes or looking at service meshes, and which ones you’re looking at. But what do we mean by a service mesh? If we think about the way we used to build applications … The way we used to build applications is we’d build a large monolith code base, and a lot of the functionality or at least a lot of related functions to be compiled together into one object, or into one

As we have started going down the road, and we’ve been doing this for a while, of basically disaggregating those complex monolithic applications into functional components. What we end up with … So, instead of now having a large, monolithic piece of code, we now have a number of smaller executables that are running, that each do some subset of the functions that one big piece of monolithic code did before.

The reason we might do that, is it is easier to maintain smaller pieces of code, easier to update smaller pieces of code. It forces a certain maturity around making sure there are clean abstraction layers between different functions. And also it means that we can reuse that code and reuse that module, or the capability of that module, for multiple applications.

Christopher: So, we now have a disaggregated cloud of smaller components that are pieced together to deliver a specific service. Which means that we’ve got a lot of network connectivity between these and a lot of messages passing between them. So, we now have said we … What we’ve now created is a necessity to build a network API sub-straight into my applications- into your applications.

Initially we did this, everyone, every developer, etc. wrote their own interfaces to all the different API’s and did all of the plumbing for all of this. This becomes harder and harder to manage as this disaggregated cloud gets larger and larger. So, what is a service mesh?

A service mesh, is described exactly this, it describes that there’s a network of microservices and the interaction between them. And indirectly it means that we have to manage that interaction of microservices. What you need to do for this is there’s a number of functional components that you need to do in a service mesh.

You need to be able to discover the services. I need a service that provides me X. I need to find the thing or things that can provide X. If there are multiple instances of the thing that can provide X, I need the load balance between those, and I need to do that in such a way that I do get reasonable load spreading.

I need to be able to handle failure recovery. If one of the instances providing X fails, I need to be able to recover from that failure and go talk to something else. Maybe another exact instance or something that has a slightly less capable version of X, etc. I need to figure out how to fail and recover.

I also need to get metrics to monitoring. What’s actually going on in this fabric now? Operationally, there are other requirements. I need to be able to do things like AB testing. I’ve deployed a new version and I wanna attach between the two versions, see which one’s users like more, so I decide which one to go forward with.

I need to be able to do canary releases. I wanna release a new version of my service into the wild and only have a small subset, maybe a pre-nominated subset, of customers using it, it’s called a canary release. So, I can see if it works and it’s bug free before I go into general deployment with it.

I need to be able to rate limit. I need to be able to make sure that a load on a specific microservice doesn’t cause my whole environment to crash. I need to be able to write access control and end-to-end encryption, and other things. So, all of these things and more, you need to be able to do in a service mesh. Early generations developers had to then each time they wrote a microservice, had to keep all of these in mind, and all these capabilities in mind.

So, if we think about it, really, certain microservices or a service mesh in a simple case would be just two microservices being able to communicate with one another. And again, this is what we need to be able to do, what we need to be able to handle. Network timeouts. If all the sudden there’s A can’t connect to B in a certain amount of time, what do we do?

If A is clobbering B with requests such that B becomes unavailable, that’s called circuit breaking, and rate limiting as well, we need to be able to handle those. I mean, again like I said, need to be able to control some of the security characteristics. So, this is sort of easy if my microservice graph, my service mesh, or my service graph is just two endpoints. And three endpoints, not that much more difficult, etc.

But, eventually you end up with this. You now have lots of microservices. Those microservices are components of different applications. So, in this case for example, you might have an application graph in order for one particular customer request, is AGM and AGI. I calls to B. And another application service graph might be for another particular request a customer might make or a user might make, might be AHP.

So, we now have this very large spider web worth of microservices. And now imagining all those things we just talked about, becomes much more interesting. Frankly, it becomes unmanageable. If we don’t do something, what we end up with is some very substantial pathologies. What we end up with is, for example, some developers write their circuit breaking and back off mechanisms one way.

Someone uses … Say some developers use an exponential back off. If I can’t connect to a service, I back off for a second. If I still can’t, I back off for two, then four, then eight, then 16, then 32[inaudible 00:08:49]. If the next thing down the line has a back off, maybe they do a linear back off, or a cliff back off where they back off five seconds

I start putting a service together, a graph of these, I now have different characteristics at different points in the infrastructure. And we’ve seen folks who have things where you might have 15 minutes worth of back off delay in a pathological case, because every developer approached the problem in a slightly different way.

Same thing with circuit breaking. Same thing with everything else. There’s not a common view of how to do this within the organization, because that code is embedded in each

So, what does something like … A service mesh like Istio provide you to be able to control this? And the concept behind the service mesh like Istio, is one of the things it can do is traffic control. So, this means that what you can actually do is you can define rules that are end-to-end across your estate of microservices.

You can say, for example, for a given service, how the load balancing might take place. 90% of the traffic goes from microservice Av one to Bv one, but 10% goes from Av one to Bv two. You might be able to do things that are geographically based. Such that, for example, if the request came in from Europe and was handled in a European front end, I’m gonna make sure I send that request to the backend also in a European data center if at all possible, rather than sending it across the ocean.

I might also route traffic based on device or browser. How somebody approached me, so I might send this to different rendering engines depending on that. These similar things for downtime upgrades and rollbacks, provide standard mechanisms for things like circuit breaking, retry and back off, and failure handling. Basically what Istio environment does or some of the other service mesh platforms that are out there, is give you a standardized way of defining these behaviors, and a standardized tool set.

No longer do each of the developers have to write a circuit breaking algorithm or a retry and back off algorithm or a failure handling mechanism. All of that is handled by the service mesh infrastructure itself, in this case Istio. So, the developers can get back to writing their functional code and rely on the underlying service mesh, to take care of all of the housekeeping things that are basically better served if they are treated holistically, and in a standard way across the organization.

So, let’s talk a little bit about some of these. One example is end-to-end service control. If we go back to the graph from two spots ago, that you would have seen … Actually, let’s go back up to the graph two spots from now. You’ll notice that, for example, A is a component of … Is a provider to microservice S, and it is a client of microservice G and microservice H.

So, it’s actually part of probably three service graphs here. So, if we go down now back to this point, what you really need to think about … This is what your users are interested in, is the behavior of the service they’re buying, not the intermediate components of that service.

So, the fact that one application does A to H to P to Zed, and another application makes use of S to A to G to V, the end user is interested in the result they get out of A. And therefore, the flow from A to G or the flow from A to Z, A to Zed, not the intermediate points. If the developers are writing all of this functionality in each component, you

We’re treating these as A to B, or A to H, and then B to D or H to Poppa, to P, etc. And each of those are independent. What I really need is that end-to-end view, ’cause that’s what I really need to assure the customers that that whole application chain is developed, or delivered as one coherent response that behaves correctly.

So, one of the things that you need to be able to do, is do end-to-end service control. If you’re implementing service mesh characteristics at each microservice, you do not get this end-to-end control. Whereas if I have a platform like Istio, I actually can have end-to-end visibility and end-to-end control over this. At the end of the day, I don’t want my latency to be more than N milliseconds across this graph.

If I’ve got a back off happening between the ingress gateway and workload A, that’s gonna consume part of my latency that I really need from ingress gateway to egress gateway. So, therefore I might need to adjust for that in the workload A to workload B session. I can’t have that global view if I’ve only been treating the individual things.

I’ve instrumented A independently, I’ve instrumented B independent of instrumenting the ingress gateway, etc. I need a holistic view of how this thing is behaving. So, one of the things Istio does, Istio allows you to model your application rather than the individual microservices. It models it as a set of microservices, but you can then define what your end-to-end characteristics should be.

So, another one as part of this, is service discovery and its related characteristic of load balancing. So, in Istio you define what a service is, you define all the service components. This case service A and service B. The way Istio works is Istio injects a proxy into each pod that you want to be using Istio with, which you usually do most or all of your application code.

There’s now a proxy in there that actually handles the traffic between your microservices. And also if you’re using things like Istio ingress, etc from your ingress into your services to begin with. So in a given pod, for example offering service B, comes up in the infrastructure, is spun up by Kubernete’s load balancing or whatever else, that pod will register not only with Kubernetes, it also registers with Istio.

So now Istio maintains a full map of the service B endpoint. Those might be different versions of code, we’ll talk about that in a little bit, etc. But the things that have been identified as offering service B. So, when service A now instead of the service A code itself, understanding service discovery, service A just basically asks, for all intents and purposes, its local proxy to find it and to deliver traffic to service B.

And that’s entirely up to the proxy’s on both sides to make that happen. In that case, the proxy on service A will request the current list of things offering service B, and Istio management processes, mixer and some other things, are going to reply back saying, “Go talk to these services in this order.” Or, “Only talk to this instance of this service, i.e. to enforce some kind of load balancing.”

So, the Istio manager will take a look at who service A is, any of the metadata that might come along with it, and make a decision. Or at least provide hints depending on how things are done, as to which instance or instances of service B, service A should try, and connect to.

It’s interesting how this then affects things like retries and back offs, and failure detection, is that Istio now having sort of a global view can steer different instances of service A to talk to different services of instance B, and try, and adhere to the overall specification you wrote for the overall service. In this case it is made up of A and B.

Some of the potential use cases of this capability are canary deployments. If you think about it you might roll out a new version of the code. Maybe this is a mobile application and you’ve got a list of users in your beta application, either in Google Play or in the Apple app store. So, you’ve got your list of beta users. And when you roll out this new version of the code, you wanna try it on your beta users first.

So, you release a new version of service B, like we said it’s called Service B Prime. And you tell Istio that’s only for users in your beta group. So, the nice thing about this is the developer of service A doesn’t need to know about canary deployments. And who knows, service A may be the thing that gets canaried next. So, they basically should never have special code in those services to handle this use case.

But now when service A makes a request to Istio to connect to its proxy, to connect to service B, one of the pieces of metadata that you told Istio is interesting is the user. So, when that request comes in from service A’s proxy, Istio’s gonna look at it and say, “Oh that user’s in the beta list.”

So, instead of directing you to service B, I’m gonna direct you to service B prime. And therefore, the users in the beta list get your canary deployment, get your new code, and everyone else continues to be at your GA code. And no developer had to do anything. The person who defines the service had to make the change.

Now, let’s say your code isn’t spectacular and you annoy a number of your beta users. And they all go in and get off of your beta list. It’s probably a bad outcome, but it’s better than having all your customers do that. So, the next time user Alice goes to your site, because they removed themselves from the beta list, the next time they make that request, and it goes through service A, and service A presents the user to Istio.

The Istio mixer in this case looks at it and says Alice is no longer in the beta list, so traffic to … Istio will send the traffic to the GA version of B, not B prime. And you now stop annoying Alice. So, that’s all in the use of canary deployment.

And we have a question. So, we have a question of can Istio with RBAC, can that be used as an example? Basically however you want to authent- indicate that a given user … If you’re using Istio RBAC as a mechanism to do identity to decide if you should be getting to canary deployments or not, that’s fine.

So you could be using Istio’s RBAC, you could be using some other mechanism. However, you are representing the users, or authenticating the users in your system, ’cause users wouldn’t really be integrating … Wouldn’t necessarily be doing RBAC directly with Istio. But if there was … Or, applications may not. If they do, certainly you could do that. Otherwise, however else you’re doing your RBAC for your application graph is fine.

So, now if we think about … So, for an example there’s another canary deployment. I talked here about users on your beta list. You also might decide that you’re gonna subject randomly five percent of your users to the new service behind. So, in this case Istio will keep track of the number of users, the number of requests that have been made, and will basically apply a randomization effort if you wanna think of it, and say it’s gonna take five percent of all requests and send that to B prime, and the rest go to A.

And then one of the things which we’re not gonna talk about today, Istio gives you lots of visibility into use of your service graph. Even Istio could tell you to a certain degree, how long people … How long those users stay engaged. You can look at is my service driving people away or holding them longer? And obviously your application could also return some of that information.

But, this gives you an ability to see if your five percent deployment is working or not. Similarly, you might decide to use canary deployments to put out a hot patch for a specific user client. You’ve got an issue, it’s sort of for a specific user client, say an iPhone. So, I’m gonna use something like a canary deployment to say, anything coming from an iPhone … Or from an iPhone as a client, gets pushed into my canary B prime. Everyone else gets to … Is still delivered to standard B.

So, that’s a … A couple of canary and canary-like use cases. Another one which is similar is a dark launch. Let’s say you’re doing some form … It’s especially useful when it’s not interactive. So, maybe you’re deploying a new version of your analytical engine to your application. You could tell your service graph to replicate all the traffic that it’s sending to your current GA version of your analytical engine, and also send that data to now your new test analytical engine.

So, this way you can actually launch off a new platform, and users would still be using the existing platform, but you’re now getting real world use of your new version or your new service. And your users don’t actually even know that’s going on. It’s going on behind the scenes. The nice thing about all of these things, is the application developer of microservice A and microservice B, have absolutely no knowledge even that this is happening.

There is no code in there that say, “Oh, I’m on this branch versus this branch, so I have to do this or that.” Or, I need to out of every 10 requests I need to send one request to DNS entry of service dot … No, beta dot service. Versus, GA dot service. None of that logic’s in there. You don’t need to build configuration options for any of that, etc. All of that’s now pushed up into the service mesh layer.

We’ll go to the next one. So from a load balancing standpoint, again this is a … You can use load balancing as a instance of, like I said, canary. So we’ve covered that already. Other use cases though, you could use in this case, versioning. So that’s a little bit like a canary deployment, but it’s maybe … It can also be used in something other than canary deployment.

You start off with a new version, then you wanna ramp it up. So, it’s not just I wanna send five percent there. I canaried it, it worked. So, over time over the next couple of days, I keep on adjusting the ratio until everything is off my old version onto my new version. There’s another use case though, where you could say that I know that some of my nodes, or I know that some of my pods instances are lower throughput than others.

In fact, Istio can help you figure that out. Istio can say, “This version of this deployment or these specific pods are slower or can handle less loads than others.” And Istio can do that ’cause Istio’s watching all the metrics of how fast all these services respond. Maybe that’s because some of those pods are deployed on an older version of your hardware[inaudible 00:26:50] and newer versions of the pod are deployed in newer versions, and you’re not gonna burn that old cluster down quite yet.

So, you can actually say, “Okay, the versions of the pods that are slower will get a total of 25% of my traffic, and the rest will get 75% of my traffic.” And again, Istio can even automatically do that for you. Do that load balancing for you based on actual real world behavior of these different pods. And will adjust as things adjust.

So, load balancing doesn’t necessarily just have to be for things like canary deployment, it can be because not everything is completely homogenous in your infrastructure. And you need to be able to adjust for that.

Another interesting case is … Another interesting thing which I talked a bit about, was circuit breaking and retry. So, if I’ve got this service graph, service A, B, and C, is what’s delivering my service. And if each of my developers independently decided what their timeout is going to be, what their retry algorithm’s gonna be, how many times they’re gonna retry, what their back offs are, etc.

You can end up with some pathologies here. If service A decide its timeout is … Yeah, in this case, 9000 milliseconds, for example, and service B, C has a timeout of 3000 seconds and a retry of … I said two here, but let’s say four times, and a back off of doubling each time … Well, what you might actually get is service A to service B timeout.

Well, service B, service C is still trying to make the request. And actually that request doesn’t actually get fulfilled, but not before service A times out. So, then service A starts again, but it backs off again, and wash, rinse, repeat. And all of a sudden you end up with these pathologically long outcomes where actual latency across this is quite high. Sometimes things work, sometimes things don’t.

It all comes down to how the timing is set up between all of these services. In this case it’s sort of easy to fix this problem, A to B to C. Just fix … You look across all these and you set the numbers correctly, and you’re fine. But what if B is also connecting to service D, because service X connects to B, and it’s asking for something from service D. And service D has different timeouts.

Service B still has the same timeouts whether it’s A or D. All of a sudden you now end up having to balance all of the settings, the timeouts, the retries, the back offs, etc. independently for each service. So, every time I add a new dependency on my service graph, I have to potentially reconfigure everything else in order to make sure that everything still works. This becomes a very fragile, if not just completely broken, model.

The model in Istio is you can say, “This is my end-to-end behavior I want. Here are the minimum or maximum characteristics that can be handled by each of the components.” And then Istio will manage because it controls the proxy’s which control the actual traffic between the microservices. Istio can then instrument and control, well in this case, I need a timeout of no more than this in order to be able to meet the end-to-end goals.

So, managing this linearly looks simple. When you go back to the original graph that I had up there with the screen. This becomes impossible to do manually on an A to B basis. You really have to look at this from an A to Zed basis. And the only way you’re really gonna do that is have something that provides sort of an over-watch across your entire infrastructure.

So, that’s circuit breaking and retry. So if you think about, and we’re gonna talk about other Istio bits in the coming sessions. But in reality, you start thinking about you need to be able to holistically look at things like circuit breaking and retries and back offs in order to make sure you can deliver an end-to-end service your customers want.

You need to be able to have different load balancing characteristics based on current behavior of systems, based on your needs to do canary deployments or dark mirrored deployments, or dark launches. There’s a number of … In similar cases. Being able to do that in one place is really, really useful. And that’s one of the big drivers for Istio.

There are other drivers for Istio, but this is one of the big ones. This means that your developers can get back to writing this code that’s important to service A, service B, service C, and not deal with deploying. So, yeah, why are we talking about this? You hear me harp on about security and core networking, etc. So, why are we talking about this?

One of the things that we have done in Tigera’s solutions and in our open source solution, is we have integrated with Istio from a security standpoint. We’re gonna talk about … We’ve talked about this before, we’re gonna talk about this again in an upcoming webinar. But the ability now to take your Kubernetes network policy and also apply it to the layer seven characteristics that Istio looks at.

So, [inaudible 00:32:40] HTP layer and forcing encryption and doing filtering based on actual HTTP or TRPC objects. So, this is why we … What we have done … See, Istio is being very important not just from a security standpoint, but in general in deploying a microservice architecture. For the traffic management, for the visibility, which we’re gonna talk about as well, etc.

So, there’s this tool of Istio is quite valuable in many aspects. That’s why Istio has gained a lot of traction. So, we can also then because it’s becoming a major player in this space, we have decided to extend our network policy up into Istio as well. So this is the intersection between Istio and Tigera at this point. And again, like I said, we’ll be talking about this in a little bit later.

So, that is a little bit on Istio and Istio traffic management and why you should care. And some of the things you can do with it. With that, I think I’m gonna turn it back over to Michael and open it up for questions.

Michael: Thank you, Christopher. So, before we get to questions, we’ll give a chance to type them in. We have some upcoming … We have an upcoming webinar in two weeks. In fact, we do a webinar every two weeks. So, the next one coming up is one April 17th, and it is on securing Kubernetes applications on the Google cloud.

You can also watch any of our past webinars through BrightTALK or at our website. And currently, we’re highlighting a webinar we did with AWS on Atlassian. It’s a case study on Atlassian and how they have moved securely to the cloud. And it’s a really great webinar.

Hold on, we do have a question that came in. We have a bunch coming in now. Hold on. Hold on a second here. Everybody sit back down. Get comfortable. So, we have a question. Here, is Isito more for routing or can it be used for security? I think you covered a little bit of that. But maybe-

Christopher: So, I think … The answer is yes. We see it … We actually see three major use cases for Istio. We see one for security of the application flows themselves. And one of the advantages in Istio and I’m [inaudible 00:36:28]a few of the things we’re gonna talk about later, is Istio can also do things like automatically drive mutual TOS authentication between microservices.

It does that automatically. We can drive that with a finer grain of capability with our network policy in Istio. But, Istio can make a determination is a given microservice allowed to make a specific request of another microservice. I.e. is this thing only allowed to post to a given set of URI’s, or is it also allowed to get from those URI’s?

It can also then, like I said, drive mutual TOS and therefore encryption of those flows as well. It can do, again, all of that without the developers having to specifically know that. So, again it allows to abstract that plumbing away from the developers.

It also means that things like heart bleed no longer require you to refactor every single application in your infrastructure, because the actual encryption is being dimmed by the Istio proxy, which is an envoy actually, right now, in every pod. So, you just need to update Istio’s proxy with the now new version of open SSL or whatever that is heart bleed

So, it sort of moves a lot of that crypto-plumbing and making sure that everyone’s using the right versions and doing the policy enforcement on the layer seven flows, that’s all the security capabilities. Use case two what we see at a high level for Istio is visibility. Being able to see how your different microservices perform, what different versions perform better than others.

What geographies, how your API’s are performing in general. Where your bottlenecks are, where are you taking a really long time to make a response in overall. This application … Everyone loves it. This application is slow. My customer web order form is slow. And why is that? Well, if I can actually see that entire service chain, the 20 microservices that are involved in painting that order form, I can see all the traffic between all those microservices, and I can see where the latencies are, etc. I can maybe go fix it.

So, visibility really important as well. And then like your question, routing or traffic management, similarly very important. The nice thing about all of these things is they’re inherent in Istio. I don’t need to go build them into each of my microservices and have each developer do it slightly differently. I hate to blag off on developers, I love them greatly. But, if you have 15 developers developing 15 different microservices, you will have 18 different ways of doing back off and 22 different ways of doing visibility, etc.

So, having one … Having it done one way is useful. What becomes even more useful though is how these things get put together. Based on the visibility of the latency of these API’s, the load balancing immediately gets automatically adjusted based on visibility of flows. You could use anomaly detection and use that to change your security posture.

And so, all of these things not only independently are quite valuable, when you start putting them together, is where you get some incredible value out of this. So, I think different people come to Istio for different reasons. Very often it’s the traffic management is the thing that drives people originally to Istio. But once you get to the …

Or … And sometimes it’s the visibility. And sometimes it’s the security. But once you see these, they all sort of play together and give you one whole that is certainly greater than the sum of its parts.

Do you see Istio for DevOps? Certainly you do. Let’s think about a DevOps cycle, now for example you can say, my text traffic should be only receiving my text. Microservice should only receive traffic from my testing harness. When I wanna promote it to production, as an example, I don’t need to change anything in the code.

I just change the routing rules in Istio, say now I want all the traffic to go to prod. I want to be able to receive traffic from the prod workloads, now be steered to this new instance, versus being steered to the previous instance. I don’t need to change anything in the code, which means you can actually test and evaluate the code and the configuration you’re actually gonna deploy. You’re only gonna make shifts potentially, in Istio.

Other things where you can do is from a DevOps perspective, you can actually watch the behavior and performance of your API’s. So, now in real time or close to real time, how your API’s are performing can feed back into your sprints. And you can say, “Okay, well we made this change, and our API latency just went up by 40 milliseconds. Can we fix that?” Versus discovering that way at the end of the test cycle.

Next. Okay. Istio is production ready. Yeah. Istio is production ready. We are seeing people deploying Istio in production, so yes.

Can Istio handle pod security policies? Istio [inaudible 00:42:19]along with admin privileges. They’re an alternate called CNI plug-in. So, I think this … First of all, Istio and CNI plug-in are different things. So, the CNI plug-in is what is providing … The CNI provider is something like Calico, etc. Provides the underlying network, the layer, three layer for a network that is necessary to support pod-to-pod communications in general, let alone pod-to-pod communications with Istio.

So, you could do network policy within Istio. However, that means that all traffic including non-layer seven traffic, or non-[inaudible 00:43:08]traffic would have to go through user land proxy. And that is a bit of a performance issue. That’s one problem. Another issue is that means that all policy is being forced within the pod. If the pod becomes compromised, the pod is …

If the pod becomes compromised, you lose all ability to control that pod both ingress and egress. If you take a look at a number of the implementations of Kubernetes network policy like Calico, a lot of those are enforced outside of the pod boundary in the underlying host. So, if you take this concept of zero trust, for example, you can use Istio to provide control of layer five through seven within the pod, and you can use Kubernetes network policy to control outside of the pod in the host main space.

And one of the things that we’ve done in Project Calico and in Tigera Solution, is we’ve extended network policy to allow you to have one object that drives both Istio security policy and Kubernetes or Kubernetes network policy API policies off of one single object.

So, versioning and what if … So another question of versioning and what if changes to Istio routing rules. And there’s a couple other questions here, like [inaudible 00:44:36]and zero trust, and intrusion detection. I don’t quite understand what those questions … Oh, okay.

So, the first question is versioning and what if changes to Istio policy rules? So I think what the questioner here is can you do what ifs? And that’s sort of the whole idea behind a canary deployment. You can have a small percentage of your traffic view one … If you have a what if question, you can have a small percentage of your traffic and filter it off to your what if scenario, your new version of whatever.

Or, I wanna route this through this other microservice that I wasn’t routing it through before first. Maybe a machine learning jig or something along those lines. Or you could even do a dark launch. And that allows you to do a what if. If it doesn’t work you just flip that off, set that balance down to zero percent, and you failed back to known good working.

So … In previous session, we discussed permission-based routing. A can talk to B and C, but C only listens to A. As better than firewall rules. Can you add to that? So, again if I think about it, I have two parties that need to agree that the communication should happen. So, let’s say I’ve got three microservices, Alpha, Bravo and Charlie.

The developer for A, for Alpha, believes that it should be able to talk to Bravo and Charlie. But the developer of Charlie believes that it should only be able to receive traffic from Alpha. So, this could be a use case where the developer of A is writing a front end and he thinks he should be able to talk to multiple back ends.

But, another developer thinks that only say PCI-enabled front end should be able to talk to my particular service because my service is PCI-contaminated. So, in that case even though A has said it should be able to talk to B and to C, the question here was only C should be able to listen to A. More importantly, B is saying, ’cause B is PCI contaminated, that it should not be allowed to be connected to by A, because even though it’s a back end, it is PCI-contaminated and A is not PCI-authorized.

So, in that case we have a mismatch in the graph. If we just relied on A’s rules, then A would be able to talk to B, which would be a PCI violation. So, both sides of the communication need to be able to communicate. This isn’t just an Istio thing. This can be enforced partially in Istio, partially in network policy, or all in network policy, or all on Istio.

But, in a zero trust environment, or a least privileged environment, both sides of the communication need to be able to agree that that communication should be allowed. So, yes you can do this better than with a classical single-point firewall. You can distribute this, and you can make this enforcement happen at multiple points.

Michael: Okay. Great. Well, hey listen. Everyone thank you so much for all your questions. I knew you had it in you. Very proud of you all. And so, with that we are going to conclude today’s webinar. And we will see you next time. So, Christopher thank you. And thank you everyone for attending.