Michael (Host): Hello everyone. And welcome to today's webinar, "Leveraging Kubernetes Services and DNS". I am pleased to introduce today's speaker, Christopher Liljenstolpe. He is the original architect behind Tigera's project Calico, and he speaks at a ton of meetups every year educating people on networking and network security for modern applications. He also consults with Tigera's enterprise clients on their security and compliance for modern application, and especially on Kubernetes, Istio, and all of those kinds of things.
Michael (Host): So before I hand the webinar over to Christopher, I have a few housekeeping items I'd like to cover. First off, today's webinar will be available on-demand almost immediately after the session is finished. It just takes time to process, about five minutes, and then it's available, and is available through the same link that you used to get this webcast. And also, we'd love to hear from you during today's presentation, so we will have a Q&A session at the end, but if you have questions please go ahead and enter them into the question bar on the BrightTALK interface when you have them, and we will try to get to those questions during the content. That makes it far more relevant to the question ... we answer right then and there rather than waiting to the end. And if you want a copy of today's slides, you can go ahead and just let us know, and we will get you a copy of those slides. You can do that in the feedback bar, you can pop it into a question. And that is all for housekeeping. So without further ado, I'd like to hand it over to Christopher for our topic today. Christopher?
Christopher: Sure. And actually, before we get to the topic, I've got a bit of a housekeeping as well. As Michael said, I do a number of meetups as well as webinars, and I find myself in different places of the country when I go talk to customers on a fairly regular basis, and usually when I'm doing those I try and see if I can attend or speak at meetups wherever I happen to be located. So if you're out there listening to this and you've got a meetup group that you participate in that you think would be interested in having me talk about Kubernetes networking security policy, Istio, and, I don't know, motorcycle repair and 80’s punk rock bands from the Midwest, feel free to let us know via the contact information as well, and we'll see if we can set something up the next time I happen to be in the area. So with that, let's actually get on to the topic at hand. And the interesting thing, it's been a while since we've talked about this, and it's mainly services and DNS, and how do you expose services and some changes that have been done recently in Kubernetes and with our product around DNS and service exposure. So it's been a while, and we still get this fairly common set of questions from our customer base, so I thought it'd be worthwhile to talk a little bit about today in the meetup .... oh yeah, webinar.
Christopher: So if we go to the first thing ... what are we talking about today. Kubernetes. It's great, as we're coming up on Christmas-time, my pods are deploying, my deployments are life-cycling, it's all good. Everything's auto-scaling. You just survived your Black Friday events and everything's working great. But I've got these microservices, I've got these pods, how do I find and consume them within the cluster? And before we've talked about things like services, etc. Kubernetes has got a services mechanism. We'll review that a little bit as well. And also, more importantly, how do I consume that from the outside? It's not all that useful if we're just talking to each other. So how do we consume these services and connect to these pods and microservices from the outside world?
Christopher: So without further ado, that's what we're going to spend a little time talking about today.
Christopher: Today we have two pods and they need to talk to one another. Each of them has an IP address in Kubernetes, and they connect to them. We'll talk a little bit later about how we find those addresses, but pod B makes a connection to pod A. That's great. But what if we start scaling pod A? It's been life-cycled, or it's auto-scaling. We now have a bunch of pod As. So how does pod B know which pod A or pod As to connect to, and how we do load balance across that? Do we just keep updating pod B with all the new pod A addresses? That doesn't seem like a very scalable or modern architecture.
Christopher: So what Kubernetes gives me the ability to do is to define a virtual entity called a service. And in this case, service A represents the different pod As that offer that service. And we'll talk about ... we'll show you the configuration for how that's done, but basically we create this entity that says service A is a set of pods that all meet this set of criteria, in this case, pod As. So service A is now this virtual container, and it points to all of the different pods in the cluster.
Christopher: So those pods we call endpoints. Now when you have a pod that's offering a service, it is an endpoint for that service. It is something that will resolve that service if there's a request for it. So how do we find this? First of all, we have to create service A, and that's created by Kubernetes resource, and I'll show you that in a little bit. And when we do that, Kubernetes maps that service A to the endpoints they're offering it and creates a virtual IP. In this case, 172 16 05. And that virtual IP is a proxy, if you want to think of it that way, for the service itself. So what then happens is there is some kind of mechanism, and there's different mechanisms in Kubernetes to do this, that will answer traffic for that virtual IP and then load balance it among the endpoints of the service.
Christopher: So pod B will discover, and we'll talk about that in a minute, that service A is at 22.214.171.124 and then, by some networking magic, a load balancer will catch that traffic and load balance it between the various endpoints. One of those ... there are couple of load balancers. One, you can use Istio, you could use Kube-Proxy, which is the standard Kubernetes mode. There are some others as well. But they're going to do that load-balancing and distribute the traffic to the various endpoints based on some algorithm.
Christopher: So let's look at Kube-Proxy to begin with. So what Kube-Proxy is going to do is Kube-Proxy watches the definition of service A. So service A is made up of some number of endpoints, Kubernetes maintains that. Kube-Proxy, which runs on each worker independently, is watching Kubernetes' definition of service A and is keeping a list of all the endpoints that offer that service, so in this case all those pod As that we talked about before.
Christopher: And what Kube-Proxy does is Kube-Proxy programs Iptables in the underlying Linux host that basically does a load-balance by doing a destination address map. So when a user requests, or another pod requests, traffic to service A, sends traffic to service A's virtual IP, on the host that they're originating the traffic from, what's going to happen is Iptables on that host, it's going to see the outgoing traffic addressed to the service's IP, and will rewrite the destination IP to one of the endpoints that's currently offering the service. And then the network will transmit that traffic to that pod.
Christopher: So basically what we're doing is, at the point of ingress, the first point where the fabric touches the packet that's destined for service A, we rewrite the destination address to point it at one of the endpoints. And then standard networking takes over.
Christopher: Today that's Iptables, and going forward in Kubernetes IPDS is an option as well and will become the default over time, so instead of using Iptable it will be using another filtering mechanism in the kernel called IPDS. Slightly more [performant 00:09:39] cleaner, and that support is coming.
Christopher: So as sort of a summary, Kubernetes services provide a logical service abstraction for a set of endpoints. It's implemented via Kube-Proxy. A long time ago it was actually user-space-proxy, and it's now Iptables and going to IPDS. Kubernetes functions fairly seamlessly, actually very seamlessly, with Kube-Proxy. We do not ... many SDMs do require you to replace Kube-Proxy, we operate with the existing Kube-Proxy.
Christopher: And let's walk through a little bit of how this happens. So if you go to the next slide, there's a couple different ways of defining a service. You can create a YAML file that describes the service, or you can just use something called [Kube Cuttle Expose 00:10:30], Kube Control Expose. And basically what you're saying is you're going to take a deployment, in this case the deployment is an Engine X deployment, and within a given namespace, Project Pink in this case, you are going to expose that deployment on port A.
Christopher: So what's going to happen is now Kubernetes is going to create a service for Engine X in Project Pink namespace, and it's going to expose it on port 80. So any pod that happens to be within the deployment Engine X will now be exposed as an Engine X service. What's going to happen is when that's done, Kube-Proxy on each node, including on the masters, not just the workers, is going to see that the Engine X service has been created and it's been given an IP address by Kubernetes, and it will collect the pods that are serving Engine X right now, the Engine X pods from the deployment, and you can write those rules as NAT rules into the underlying host kernel, identified by N on these boxes. Each node's Kube-Proxy is going to write those rules.
Christopher: What's going to happen, then, is when a given pod tries to connect to the IP address for the service Engine X. It's going to be an IP address. If you're running Calico, it's going to be routed on the node, or Tigera, it's going to be routed on the node. If you're using some other SDN it will somehow get to the underlying node from the pod, at which point the Iptables will pick it up, change the destination address of that packet from the service IP (which doesn't really exist as a network entity, it's just a placeholder IP) and will rewrite it to the IP address of one of the serving nodes, in this case, IPE3, which happens to be on another host. Could be on the same host, could be a different host, doesn't matter, and then standard networking, routing, etc. will happen and send that traffic to the destination pod and the service will be delivered.
Christopher: So how do we discover those service IPs? So in Kubernetes, we use DNS as service discovery. It's sort of the industry standard, it's what we've used for service discovery for decades in IP networks. Today DNS in Kubernetes ... until very recently, the default DNS provider in Kubernetes was something called KubeDNS. I need to reprint this slide because as of last release, CoreDNS is the new DNS provider, although you can still use KubeDNS. But there's a DNS provider that Kubernetes works with.
Christopher: When you define a service in Kubernetes, Kubernetes will send that service creation event to the DNS infrastructure within Kubernetes and a DNS name will be generated. So if you created a service, like we just did, Engine X in namespace Project Pink, and we give it a virtual service IP of 172 16 05, a DNS record will be created within Kubernetes that says Engine X.projectpink.service.cluster.local is at 172 16 05. You can also have pods get IP addresses as well. But that's how DNS gets created. So whenever you create a service, the service's name will be created as a record in DNS within a subdomain for the cluster where that subdomain is the namespace. If you did not define a namespace for your service, it would be in the default namespace. So instead of Project Pink, it would be Engine X.default.service.cluster.local.
Christopher: So that's how we find a service IP. You then program and use service names as long-lived names, and then in your applications configuration you say I need to connect to Engine X.default.service.cluster.local or Engine X, and you will resolve that name within the Kubernetes cluster to whatever [locally 00:15:11] Kubernetes is offering the nGinx service.
Christopher: So what's going to happen ... this is just another way of creating a service, rather than the expose. In this case, we're creating a YAML file as a type service, and the name of the service is going to be web-server, and it's going to be in Project Red namespace this time. And the selector is ... we're going to select for any pods that have the two key-value "pairs of name is equal to Engine X" and "tier is equal to front end." Cluster IP is where we could say this is the IP we want this to have. If we say none, it will get automatically picked. [We'll 00:15:56] pick an IP address.
Christopher: So in this case, when you do this, again, the DNS name for this would be web-server.ProjectRed.service.cluster.local. When that happens, if we don't have a name server in Project Red yet, what's going to happen is Kubernetes will stand up an instance of the DNS server in Project Red. It will be running as a pod somewhere in the network in Project Red's namespace. So in this case, that will be on IP 82 is the DNS server now for Project Red. The way resolution works, by default Kubernetes will use ... in the pod you'll find a [net key 00:16:40] result.com, which is a way of configuring how you resolve names within a Unix host or a container. And the entries will be "my ...". They will first look for a name, web-server, for example, in that pod's namespace.service.cluster.local, then in default.service.cluster.local, and then whatever other things you told it to look in, like your corporate DNS hierarchy.
Christopher: So if web-server doesn't exist in Project Red, the next thing we'd try is resolve web-server in default. That means, for example, if you want to have services get shared commonly across multiple namespaces, you may want to put them in the default namespace, or you may want to put them in another namespace and then add that as a part of the config to the result.com that gets fed to pod so that ... and that's a DNS config in Kubernetes, so you can say I want to search for globalservices.service.cluster.local as one of the domains, and then put services in the global services namespace.
Christopher: So anyway. That's a little bit of how DNS works. Now let's talk a little bit ... before I go on, I do want to talk about another capability that we have added in Calico 3.4 for services. So in Calico 3.4, we now have the capability of BGP announcing service addresses. So for those of you who know about Calico, if Calico's running on private Cloud we use BGP to connect the pods and the Kubernetes cluster to the rest of your infrastructure. But to date, the only thing we announced was the pod's IP addresses. We did not announce the services addresses.
Christopher: Starting with Calico 3.4, we have an ability to announce those service addresses via BGP. And we can do that one of two ways. We can either announce the service address ... so let's say we go back and we have those five pods offering service A, and those five pods are running on five different hosts. If you define the service as a local service, then only those five hosts hosting pod A at that point in time would announce service A's IP address. If you it set as a cluster-wide model for that service, then every node in your service will announce service A's IP address via BGP. So you can control who announces it, but by turning a knob now in Calico 3.4 you can enable service address announcement via BGP. And we're going to continue to add capabilities around that space in forthcoming releases. So you now can physically expose your service IPs outside of your Kubernetes cluster directly.
Christopher: Let's talk a little bit about DNS. I just alluded to the fact that CoreDNS is now the new DNS provider in Kubernetes. It's the new default as of 1.12. You can still use the legacy KubeDNS. The reason that we did this is that the KubeDNS - we as in the Kubernetes community - KubeDNS was built on a sort of older technology base, and it didn't have a lot of capabilities extendability and it wasn't necessarily the most robust DNS server in the world.
Christopher: So the CoreDNS team took a very well-respected server called Caddy, which started off life as a go-based HTTP server, and said we could make a really cool DNS server out of this infrastructure. It's very scalable, very extendable, etc. So they basically took the Caddy base and flipped it over into being a DNS server. That became called Core DNS. It's a CNCF project now. Very extendable and pluggable. It's durable as well. From the get-to, Cabby was designed to be a server that faced the public internet and therefore fairly robust in face of attacks and hostile environments. Whereas before I never would have counseled somebody to expose a KubeDNS to the outside world, a CoreDNS server I think is perfectly fine to expose to the outside world. And it's extensible. There's lots of things we can plug in to CoreDNS. There's lots of capabilities and features that we can plug in.
Christopher: So, let's talk about some of those. One, this is a drop in replacement for KubeDNS. We can rewrite DNS rules. So we can rewrite DNS entries, and I'll show you that in a minute. We can also configure it to trade off memory usage for external resolution time. So if you're resolving external DNS queries in KubeDNS and in CoreDNS, that can take some time. We'll talk about that in a couple of minutes. But it's very memory efficient. You can say you want to be a little less memory efficient, but you want the resolver times to be shorter, and you can configure that. And we'll talk about that in a minutes.
Christopher: Also, external CoreDNS servers can be connected to one or more Kubernetes clusters and namespaces. So this becomes interesting when you have services in Kubernetes, they want to be exposed to the rest of your organization, or even broader than that. What you can do is have .. within Kubernetes obviously you have your CoreDNS servers are running and resolving answers coming from within the cluster, but how do things outside of your cluster resolve those names?
Christopher: So you can actually run CoreDNS servers as just regular DNS servers like any other DNS server in the organization and point to them via your DNS route in your organization and delegate domains to them. And then those domains can be wired into different Kubernetes clusters and namespaces. So you can actually have a CoreDNS server that is going to answer queries for Kubernetes resources just as if it was in the Kubernetes cluster itself. In fact, it's getting the data from the same place as the in-cluster DNS servers. Everything's going back to the Kubernetes API server, or to STB.
Christopher: So it's going to behave just like a Kubernetes DNS server, it's just going to be exposed outside of the cluster. [inaudible 00:23:47] gives you a very easy way of surfacing resources within Kubernetes on a DNS basis to things outside of the cluster itself. If we go to the ... I'm talking about CoreDNS autopath, this is a feature where you can adjust the trade-off between memory and resolution time. If you don't have this autopath feature turned on, DNS answers for zone entries, or DNS entries within the cluster, DNS entries take about two milliseconds. If you try and resolve something outside of the cluster it can take up to twelve milliseconds. But you'll be using less than 200 megabytes per CoreDNS image to do that, your memory structure will be fairly small, even in very large clusters where you have 150,000 or more pods and services.
Christopher: If you don't like that six-fold increase in time between internal and resolving cluster.local versus google.com, you can turn on autopath and what that's going to do is that's going to keep all resolutions down to about two milliseconds. So everything will resolve in two milliseconds now, give or take, but you are going to have an increase in memory utilization, and a little bit over double the amount of memory utilized, because it's going to be doing a lot more caching and then there's some other things it's going to be doing in parallel. But you're going to be using more memory in order to get that performance. And now you can trade off, do you want your CoreDNS servers in your cluster to be using more memory and responding faster, or responding a bit slower.
Christopher: The next thing you can do is you can do DNS rewrites. So there's a whole bunch of things here, I encourage you, if you're interested in doing this, in looking at the CoreDNS documentation, but you can do things like replace suffixes and domain names. And the second example there, rewrite name suffix smoogle.com to .google.com, and you can create a smoogle.com zone in CoreDNS and then if somebody tries to resolve dub-dub-dub.smoogle.com, CoreDNS will rewrite that to dub-dub-dub.google.com and then resolve that. So you can now start using this to swap out names from, say, a Kubernetes name to a more externally friendly name to expose those services to the rest of your estate.
Christopher: Similarly, you can even use reg-exes, regular expressions, to rewrite these in a more programmatic fashion. So in this case anything.uswest1.example.org is going to get rewritten to anything.service.uswest1.consul. So if you're using consul as a name resolution service, or as a discovery service, you can map those consul names into a corporate name.
Christopher: So CoreDNS gives you a lot of capabilities. There's a lot more than just that, but those are some of the high points.
Christopher: The last thing I want to talk about is external DNS. So external DNS is an incubator project within Kubernetes. I would not use it yet in production, but if it's something of interest I'd play around with it. The API is not static at this point, it's still an incubator project. The API might change. But what it allows you to do is attach an annotation to a Kubernetes pod that says I want this pod, or this service, to have a specific name. So in this case I want my Engine X pod, which if you'll remember is a service of the Engine X.whatevernamespace.service.cluster.local, it will still get that, but it will also be given a name of Engine X.example.org.
Christopher: -including AWS route 53, including google DNS, [inaudible 00:28:23]. There's a number of name servers that it works with. What's going to happen whenever we grant a service IP to a given service where that service definition has this annotation, what external DNS does then is update ... you configure external DNS with what name servers it's supposed to manipulate, and the credentials it would need to do so, and then external DNS will plop in a record, an A record, or a quad-A record for a given service, in this case, Engine X.example.org, into the example.org name service. If it has the right permissions, that's accessible. So for example, if you're on AWS, it could update your example.org route 53 records or your google Cloud DNS records or if you're on on-prem, private Cloud, and you're running CoreDNS, it can update your CoreDNS server with hosting example.org.
Christopher: So this now gives you the ability of assigning directly external DNS names. So you can now ... so if it's external DNS names for your services using external DNS, or CoreDNS, rewrite googles, or a combination of the two. So there's a number of interesting things you can now start doing with DNS to make it easier to expose and consumer your Kubernetes resources outside of the cluster.
Christopher: And, II think that's about the end of my little quick tour through Kubernetes services, BGP announcements of services, and DNS.
Michael (Host): Thanks, Christopher. The next Tigera webinar is coming up in January. This was our first in the "Kubernetes Learning Series." It's about topics that are important to Kubernetes that aren't directly about related to security and compliance, the core things that Tigera and Calico do.
Michael (Host): So the next episode is based on Kubernetes ingress and egress traffic management, nd that's coming up in January, on Tuesday the 19th.
Michael (Host): A copy of the presentation will be made available to you by email. We'll send out a thank-you email from us and have a link to that, if you want to review this content. But once again, like I said, as soon as we end this webcast it will be available for replay on the BrightTALK network or at www.tigera.io/webinars.