Michael: Hello, everyone, and welcome to today’s webinar, “Understanding and Troubleshooting Kubernetes Connectivity Issues.” In this presentation today, we’ll show you how to identify, understand, and solve the most common Kubernetes connectivity issues. So before I introduce and hand things over to our speaker, Karthik Prabhakar, K.P., we have a few housekeeping items to cover on this presentation and the webinar platform.
Michael: I’m pleased to introduce today’s speaker, who is the first time. I’ve been doing this now for something close to 30 webinars here at Tigera, this is the first time I’ve had the pleasure of having Karthik, otherwise known as K.P., join us. So welcome Karthik. Karthik is the Senior Director of Solutions Architecture at Tigera. And you can see there everything he’s done around secure application connectivity and helping customers. So without further ado, let’s get into the presentation. Karthik?
Karthik P.: Thanks, Michael. And good morning, or good afternoon, or evening, depending on where everyone is. Looks like we have actually a pretty massive audience here, so glad to see all of you join. And, obviously, when you look at Kubernetes and some of the different frameworks for Kubernetes, for connectivity that Kubernetes presents, there [inaudible 00:02:01] abstractions that are used. And they’re used for different reasons, right?
Karthik P.: So there’s obviously things like making sure that your pods, which are your atomic unit of work in Kubernetes, can communicate with each other. So we’ll spend a little bit of time on that today. And because your pods are dynamic and transient, Kubernetes presents this abstraction called Services, where you have a nice abstraction in [inaudible 00:02:26] dynamic and transient pods. And we’ll explore that, as well. And how you find services is by leveraging constructs like DNS.
Karthik P.: But then beyond that, once you get past the basic connectivity, how do you secure your workloads? So that only the right workloads are allowed to talk to each other. You have the right level of isolation. That is afforded by network policy. Ingress is a construct that is commonly used for traffic coming into the cluster, especially if you’re looking for [inaudible 00:02:56] termination and whole standards. And that’s also something that’s commonly used in Kubernetes deployments.
Karthik P.: And increasingly over the last couple of years, two, three years, you actually find that a lot of microservices deployments in Kubernetes benefit from deploying a service mesh like Istio together with Kubernetes because of the value it provides in observability, traffic management, security, and so on. So all you see is the certain constructs that you get with Kubernetes. And so it’s no surprise that a lot of people are intimidated by looking at all of these constructs and trying to understand.
Karthik P.: And certainly, when they have issues, troubleshoot it. And the way I look at this is when you look at the formal dinner setting, you look at all of the different choices available to you in terms of your silverware, right? Do you really need four forks? Do you need five knives? Do you need two spoons? And you could look at that in Kubernetes, as well. Do you really need all of these abstractions? And the way I intend to describe it is that each of these abstractions, each of the plate settings is there for specific reason. You don’t need to use to all of them, you can probably get by from just using a fork and perhaps a knife, but if somebody places a soup bowl in front of you, you’re going to have a hard time using a fork to slurp up your soup.
Karthik P.: So that’s a similar way that I tend to look at Kubernetes connectivity abstractions, in that you don’t necessarily need to use all of these abstractions. Some of these might be sufficient, like going with basic pod, and service connectivity, and, perhaps, network policy. But, ultimately, as you start to deploy real world applications, it’s best to understand where each of these abstractions will best fit so that you can pick the right tool for the job.
Karthik P.: And so given that, we will not have time to cover all of these in sufficient depth in this webinar today. I’m going to split this up into multiple sessions and today we’ll focus on the core elements of pod connectivity using CNI. We’ll also talk about services and kube-proxy, both using the newer IBDS, basic kube-proxy, as well as the IT cables basic kube-proxy. And also, how DNS and service discovery works in the Kubernetes context. And then in a subsequent webinar, we will cover topics like network policy, ingress, and ingress gateways, as well as service smash and rated apps.
Karthik P.: That way we can spend sufficient depth on each of these topics. So that said, I’m going to do just a quick intro of about Tigera, in case you haven’t heard of Tigera. Tigera has been involved in Kubernetes since the early days. It has been very focused on the networking and network security within Kubernetes. Obviously, we’ve also been working upstream. And in fact, some of the upstream core concepts of Kubernetes, like network policy, how CNI is used. Tigera has either direct [inaudible 00:06:01] in enabling some of these concepts.
Karthik P.: And in many cases, even maintenance, such as in the case of CNI. We have also been very involved with service mesh technologies, like Istio and Envoy. And today, when you look at a number of the very large Kubernetes deployments, the most advanced cluster. And generally you find Calico widely deployed as part of that. Calico being Tigera’s flagship project. Now, skipping past the introduction here, for this particular webinar I’m going to make some assumptions on your profile.
Karthik P.: I’m going to assume that all of you on the webinar here have some basic Kubernetes awareness. If not, in depth awareness on the connectivity side. At least being able to understand concepts like kubectl, and the API server, and the concept of a pod. I’m going to assume you already know that. I’m assuming that means you’ve already used kubectl and you’ve already used some of the basic kubectl commands, like kubectl, get pod, describe pod, get service, kubectl logs, and so on.
Karthik P.: And we’ll walk through that, as well. I’m going to assume familiarity with Linux, including basic connectivity tools, like ping and traceroute, which we’ll obviously use in troubleshooting. Being able to looking up using dig and nslookup, being able to do IP route show to look at the Linux routing cable, and just basic understanding of Linux connectivity. How does connectivity work through Linux?
Karthik P.: We are specifically focusing on Linux. Tigera also provides connectivity by CNI, Calico CNI, as well as policy in Windows, as well. But we’re not going to cover Windows in this particular webinar. Now, [inaudible 00:07:46] today is we’ll take a quick run through CNI and pod connectivity, how that works. We will use Calico as an example, using Calico CNI and Calico networking. Concepts presented here would be a little bit different with other CNI plugins. And we can certainly cover some of those in future webinars, as well.
Karthik P.: We’ll take a quick look at the live cluster and look at how we can troubleshoot basic pod connectivity issues. Then we’ll switch back and look at services in Kubernetes and kube-proxy, the basic concepts there. And then, again, having covered that, we’ll switch back to the live cluster and share how end-to-end connectivity works, including DNS, as well. So with that said, let’s jump into CNI. So the basic concept in Kubernetes, in a typical worker node, you have a Kubelet that is running as part of your Kubernetes cluster.
Karthik P.: In effect, the Kubelet is responsible for launch pods once the pods have been scheduled on a given worker node. And a pod ultimately is a set of containers that are serving an application with a common network name space associated with that. So once a pod has been scheduled on a given worker node, the Kubelet is responsible for launching that pod. And specifically, to get that pod connected to the network, the Kubelet will call out to the CNI plugin. The CNI plugin is typically stored as a binary on the host with a configuration normally stored on the /secninet.d.
Karthik P.: And what happens is when the Kubelet launches the CNI plugin together with the IPAM, the IP address management, that goes hand in hand with the CNI plugin. In effect, it’s the job of the CNI plugin to assign an IP address of the IPAM plugin, to assign an IP address to that running pod. And essentially take that virtual ethernet, connecting that pod to the host that the Kubelet that has set up, and make sure that the connectivity works for that pod to the rest of the cluster and to all of the other pods in the cluster.
Karthik P.: So in a nutshell, what happens in the Calico CNI case is that the Kubelet calls out to the Calico CNI plugin and either the host worker or the Calico IPAM module. And when it’s asked to network this particular part, it finds a way to assign an IP address to the pod. And then gets that pod connected to the network. The way it does that is obviously when you have a virtual ethernet that connects the pod into the host networking main space, in effect, an IP address gets assigned by Calico IPAM.
Karthik P.: And it writes that workload ID to the data storage happens to be either Kubernetes or it could be a standalone key value store, like FTD. And in effect, what happens then is that now we need to get that pod connected to the rest of the network so that it can communicate with pods running on other worker nodes. And in effect, what happens in that scenario is that Calico CNI in effect creates the workload ID in the data store. And separately, we have another Calico node agent, which is running as a [inaudible 00:11:07] so that is a pod that gets automatically started one every worker node in the cluster as the worker node would scrap.
Karthik P.: And the Calico node agent has something called Felix, where Felix realizes, “Look, I have a new pod in my node and I need to create a little route in the Linux routing table.” That says to reach this particular pod. In this case, the pod’s IP address is 184.108.40.206. I want to send traffic down the virtual ethernet assigned to the pod. So that local connectivity within the node works. Separately, depending on the connectivity mode chosen in Calico. Calico gives you a variety of modes.
Karthik P.: We have agents running on all of the other nodes, as well. And if you happen to be using VXLAN, you’re using VXLAN capsulation between nodes. In effect, the Felix agents and other nodes in effect also discover the fact that this workload happens to be running on worker node one. And the Felix agent and worker node two essentially creates an aggregate route, pointing to the pod IP, or in fact, an aggregate route pointing to a range of pod IPs. The next top for that being the worker node one and pointing to a VXLAN, that Calico device that Calico creates in the Linux kernel.
Karthik P.: And ultimately, traffic that is sent to that Calico device gets encapsulated in VXLAN’s. Either the packet’s encapsulated in IP packets with UDP port for seven, eight, nine. And [inaudible 00:12:42] across the wire. So in effect, what happens is all of the other nodes in the cluster now know how to reach this particular pod that has been described. Calico does support other modes of operation, as well. Another common encap mode that’s used, of simpler layer three encapsulation rather than encapping at layer two. Which leads to less overhead.
Karthik P.: There’s something called IP in IP, which is simply encapping IP packets in other IP packets. So the packets coming in from the pod are encapped into a bigger IP packet. But in that case, using the source and destination IP of the host themselves. So all of this traffic flows seamlessly across any intermediate infrastructure as long as you have IT connectivity between hosts. And essentially, we have internal protocol that we used. Specifically, we use BGP using a demon called bird to communicate routes between different nodes.
Karthik P.: And in effect, when traffic gets sent by the pod, they are encapped into these big packets and sent across the wire using IP in IP encapsulation. And each of the address is there, 220.127.116.11, 18.104.22.168 are all just IPs given to pods. These are all derived from a cluster cidr that is used when you deploy the Kubernetes cluster. And ultimately, are mapped into a pod cidr, which is used by Calico IPAM to assign IP addresses. Calico also has the most sophisticated and advanced capability of being able to not use encapsulation at all and simply use normal IP addresses with no encapsulation.
Karthik P.: And in this case, if your worker nodes are all laid to adjacent in the same layer to a segment, perhaps in a similar availability zone within the cloud environment, or in the same sublet within your on premise environment, or you’re able to pair Calico with exit structure. For example, you can pair your Calico node agents with things like your top of rack and be able to share your pod IPs as first class IPs within your infrastructure.
Karthik P.: In this case, we’ll notice that the route actually points to dev.eth0 rather than previously pointing to dev VXLAN with Calico or dev tunnel zero. In this case, what you’re doing is you’re simply using the Linux kernel for normal IP routing when I can send the wire. And in this case, obviously, you have to make sure that you’re [inaudible 00:15:14] firewalls, things like your security groups. NSGs do not have additional restrictions on traffic between pods because in this case we’re just using the pod IP to send the packet out over the wire.
Karthik P.: And this makes troubleshooting a lot simpler, but obviously you do need to make sure that your network infrastructure is able to carry the pod traffic from host to host. So those are your basic concepts and the different modes that Calico and work in, right? And when you deploy Calico as part of your Kubernetes cluster, you pick the manifest and you customize the manifest to specify the mode that you want to use, but traffic between nodes. So let’s now take a quick look at how this actually works in practice.
Karthik P.: But before I do that, one other point I will mention before I get to that is ultimately in Kubernetes you assume that when pods come up, pods are given an IP address. Every pod has a unique IP. And there is seamless connectivity between pods. Now, in practice, you may not want all pods to talk to each other. And for that purpose, Kubernetes provides an abstraction called network policy. The Calico team actually has defined network policy upstream in Kubernetes. And fundamentally, you define your isolation primitives in a declarative manner. A little bit decoupled from your network connectivity.
Karthik P.: And ultimately, it’s the role of a network policy implementation, which Calico also provides, to be able to dynamically segment your applications from each other as these pods come and go. And for this purpose, essentially what Calico does is leverage the Calico node agent to create this declarative policy, but instantiated using whatever mechanism, whether it’s IP tables and IP sets, or XAP at a lower layer in the Linux kernel stack to be able to dynamically create rules for isolation at the point in which the pod connects into the host.
Karthik P.: So you’re already isolating the pod even before it connects to the host. And optionally, Calico can also do this policy driven protection, but the host interfaces as well do not just protect pods, but also to protect services in Kubernetes, such as node ports as well as other host network applications, as well as protecting the Kubernetes control plane itself. But that is a topic for a different webinar. But now, let’s go look at pod connectivity and how do we troubleshoot that.
Karthik P.: So given this flow of traffic between pod to pod, if you look at pod sending traffic to the host, the host routing traffic to a remote host. Ultimately, there are different places where things can break. First of all, the pod might show up and it might not get an IP address. So how do you troubleshoot that? Typically, you would want to look at CNI plugin concern in the IPAM module.
Karthik P.: And the way you would do that is by looking at the Kubelet logs because the Kubelet is what is calling these binaries. Second is the pod sending traffic to the right IP address, right? If you’re trying to reach another pod of IP over the network, is the pod sending traffic over the virtual ethernet and is it getting to the host? It’s getting to the host. Does the host have the right routes to be able to route traffic to the remote host to get to the other pod? In other words, is the routing table correct? The fourth thing you will look at is, assuming the routing table is correct, has the traffic been encapsulated correctly.
Karthik P.: Or if it hasn’t been encapsulated, is it still being sent out over the virtual wire? Is it going out over the ethernet interface? The fifth place you will look at is if it’s being sent out over the wire, is it being received on the remote node? In other words, has your network infrastructure in the middle actually carried the traffic all the way over to the remote node? Do you have a security group, or NSG that is providing traffic? Do you know how to [inaudible 00:19:10] firewall? Is your [intermigate 00:19:12] router dropping traffic?
Karthik P.: So if the traffic actually gets to the remote node, that’s a good find. So traffic has actually made it all the way to the destination node. And at that point, is it actually being routed? So does the destination node have the right set of routes and is it sending it to the destination pod? So that’s ultimately the last step you would look at. And if that works, then guess what? For the return traffic back from the destination port back to the source, you would essentially retrace the steps, but in the opposite direction.
Karthik P.: So let’s do this. Let’s go take a look at a live cluster to see how pod connectivity works before we get to the services topic. So right here is a cluster with a couple of worker nodes and a master node. So if I do a kubectl, get pods, all name spaces. You notice I have Calico networking that is running on that cluster. My cluster has been provisioned with core DNS at the DNS plugin. We’ll come back and discuss that in a few minutes. And we also have things like Kube-proxy.
Karthik P.: And I know that if some of you want to be bigger, for now I’m going to leave the font smaller because this is actually not the screen I want to focus on. I’ll come back to a different screen. Sorry. Michael just told me I am not doing a screen share. I forgot to [crosstalk 00:20:40]
Karthik P.: So I have a plus tail with three nodes. And if I do a kubectl, I’ll get ports on name spaces. As you can see, I have Calico running in my Kubernetes cluster providing connectivity. I have DNS being fulfilled by core DNS and I have kube-proxy, standard [upscreen 00:21:21] of Kubernetes kube-proxy providing the services connectivity, right? So that’s what’s running in that cluster. [crosstalk 00:21:27] a couple of worker nodes.
Karthik P.: I can go look at these worker nodes, as well. But if I want to do things, like watch my Kubelet logs and my worker nodes to see what IP’s being assigned. I would look at the Kubelet logs and just typically observe using general ctl. So essentially you would do a general ctl [inaudible 00:21:51] these are the various Kubelet logs which are running on that particular node. So now, let’s do something interesting. Let’s go launch a couple of applications. So I’ll do …
Karthik P.: I’m just going to deploy an NGINX application together with the Centos part of the client. And in effect, I’ve saved to the Centos part of the client. And I’ve created a couple of replicas of NGINX within the NS1 open name space. And a service that is printing that. So you’ll notice that I have the Centos part that is running, that’s got an IP address from the pod cidr range. So the 10.23 range is not pod cidr. And a couple of NGINX replicas, which are part of a deployment and fronted by a service called open-NGINX.
Karthik P.: So now, if I go back and look at where these pods have been schedules, I looked at my Kubelet logs on this particular node, right? Some of these pods got scheduled. You’ll notice that Calico provides some detailed information in the logs in terms of the IP address assignment, the fact that these pods are getting connected. So the pod shows up and it doesn’t get an IP address. That is the first place that you would look to see what happened. Why did the pod not get an IP address?
Karthik P.: And typically, it would lead to something like [inaudible 00:23:13] configuration error in your Calico manifest, that you haven’t configured it correctly. And where you would want to look for that is under ETCCNI net.d. And what you should see there is the different configurations, which Calico automatically sets up as a demon set. So this is all automatically set up by a Calico manifest when you deploy a cluster. So in effect, the configuration for what you’re deploying is stored here.
Karthik P.: And typically, all of this is seamlessly configured by Calico, including the access to things like kube-config, and so on. Now, the next thing you’ll want to check is, in this case, since my pod came up and an IP address was assigned correctly. Now, how do I reach other pods in the network? So let’s do this. So let’s do a kubectl exec into the pod. So we’re going into the pod which is running an NS1 open named space. And the pod name is Centos open. All right, so now we are inside the pod’s name space.
Karthik P.: And now, we have to figure out can we connect to other pods. So let’s say we’re trying to connect to the core DNS pod. That is the IP address of the core DNS pod. You can re-ping it. Guess what? We’re able to ping it. So that’s great because now we’re able to connect from this Centos pod to other pods with an infrastructure. And the way this actually works is ultimately based on Linux doing the routing from the pod network name space using the standard Linux routing table and being able to route to other nodes.
Karthik P.: So let’s look at what that looks like. In this case, in this particular cluster, I deployed it using VXLAN. So if I look at my cluster, essentially what I have is I have [inaudible 00:25:14] interfaces. If I do an IP address show, essentially I have the ethernet interface, which is ENS5, in my case. The docker bridge is unused by Calico, so we ignore that. And ignore the kube-IPVS. That’s the services. We’ll come to that in a second, but to every pod that is running on that [inaudible 00:25:35] Calico creates a Cali interface.
Karthik P.: So since I have three pods on this host, I have Cali interfaces for each of them. And in addition, I have a VXLAN interface that’s set up for encapping my VXLAN because I deployed with the cluster using VXLAN. So if I go look at the host, if I do an IP route show, in effect, what Calico has set up is it’s set up a slash 32 route for every pod running on that particular host pointing to the particular Cali interface. Similarly, Calico, the Felix agent, is also set up in aggregate route, as you see here for traffic destined to pods on remote hosts.
Karthik P.: So a slash 26 route pointing to the host that that pod happens to be running on. In this case, it points to that remote VXLAN endpoint on that host and says send it out the particular device. VXLAN Calico, so that the kernel actually does the VXLAN encapping. Often, you want to make sure that things like your NTU are correct. This was also things that you can [prequire 00:26:41] your Calico manifest. Most of these are typically site ID falls, which are pretty safe. But generally speaking, that’s all there is to it in terms of making sure that your pod connect with you between host works correctly.
Karthik P.: It is simply to make sure that you have the slash 32 routes for the individual pods. And that pods on remote hosts, you have a route in the Linux routing cable pointing to the correspondng hosts, right? So if those do not exist, what do you do? Then you can actually go back and look at a couple of things. You can look at, first of all, what are my logs for my Calico nodes. Are there issues? Did I have issues in the Calico CNI plugin? And if the routes don’t exist, I can go look at the logs for my Calico node part to see if there’s any errors there.
Karthik P.: So that’s what I would do in that case, is do a kubectl log. And in effect, if there’s any errors in creating routes, if there’s any errors in connecting pods to each other, you will see error logs rather than info logs. That Calico node says, “I’m having a problem connecting with this part.” It might be a warning, as well. So those are the things you’d want to watch out for. And as you can see here, generally when things work well you get informational logs, but things are very good because now you have pod connectivity between then Centos pod and other pods in the infrastructure.
Karthik P.: So let’s do one more thing. Let me go look at a different cluster, where I have Calico deployed with straight IP without VXLAN encapsulation. And look at what the routing table looks like here. So if I do the same thing here, I have … Let me launch the same application.
Karthik P.: [silence 00:28:41]
Karthik P.: And what you’ll see is the app starts up here with same services and so on. And if I go look at the routing table here on this particular cluster, notice that I have now the same slash 32 route, which added that slash 26 route. But in this case, it is pointing directly to the ENS interface rather than pointing to VXLAN interface. So in this case, Calico is sharing the routes with each other using the word demon. The same aggregate routes, but in this case, it is pointing directly to the destination host IP address and pointing directly to the ethernet interface without encapsulation.
Karthik P.: So it’s actually being routed by Linux with no encapsulation. So now, if you’re using Calico with no encapsulation, there is no packets in packets to troubleshoot. You’re simply troubleshooting the pod IP packets all the way from source to destination. It also means that if you have things like security groups or intermediate firewalls, they are allowing the protocols and traffic that you want to allow between nodes. So you need to have the right holes in your external firewalls, if you have those. All right, so let’s come back to the next set of troubleshooting steps. So we talked to making sure the pods have an IP.
Karthik P.: We talked about once the pods have an IP, checking to see that the pods are able to talk to other pods, whether they are on a pod network, where the pod cidr range, or whether they’re a host network. In my case, my hosts happen to have that 172.24.20 range. And that connectivity works between any set of pods, right? Assuming you didn’t have policy defined. Now, so far, everything is good in our scenario and pods are able to talk to each other. So now, what is the next scenario we need to consider? And for that, I’m going to come back and stop sharing and go back to the slides here.
Karthik P.: So as we know, in Kubernetes pods are dynamic and transient. So pods come and go. And for that purpose, pods are great to make sure that ultimately connectivity between pods work because that’s ultimately the unit of communication. But that’s not sufficient because if a pod dies, your application should not be effected. And for that purpose, Kubernetes provides an abstraction called services. And you define a service in a typical declarative fashion in Kubernetes using a manifest.
Karthik P.: And when you declare it, you use a selector that ties that service to a particular deployment. So in this particular example, the selector app, equals my app, will allow the service to be backed by any deployment that has that same selector, i.e. app equals my app. And ultimately, what the service is saying is that I want the service to be port 80, but I want you to map to port 9376 on the pod, which is the target port that I’m using.
Karthik P.: And ultimately, what happens here is that when a pod is sending traffic to a service, the service assigns a virtual IP. In this example, I’m using 172.16.0.5. And a consistent virtual IP or a cluster IP that fronts the service. The service itself is backed by these very dynamic and transient pods. And each pod, as you saw in my previous example, has a dynamic IP. In my example, it was a 10 dot address, 10.23, but in this particular example the pod has a 192.168.
Karthik P.: And as pods come and go, the role of a service is to keep track of all of the individual pod endpoints, but dynamically map the 172.16.0.5 cluster IP to one of these backend pods. And ultimately, when a different application, like pod B, tries to reach a service cluster IP, this mapping is dynamically maintained so that the application connectivity works even if one of the pods happens to die behind the scenes. And this function in Kubernetes is performed by kube-proxy. So kube-proxy, in effect, what it does is it watches the service in Kubernetes and watches the scape of endpoints.
Karthik P.: And in effect, as a service determines through likeness and readiness checks given endpoint pod is alive or dead. So Kubernetes has this concept of liveness checks and readiness checks where it is constantly checking based on what you configured in your application manifest is how you determine whether this pod is alive or dead. The service also dynamically watches that state and, ultimately, kube-proxy is watching the service. And dynamically creates naturals in either IP tables or IPVS to manage that mapping between the service IP and the actual pod IP.
Karthik P.: So in effect, for a cluster IP, in effect kube-proxy is dynamically creating these IP table’s rules. So that’s how you get this mapping from the 172.16.0.5 to one of the pod IPs. Kube-proxy, there’s a second mode called node ports. Which, in effect, because cluster virtual IP is not typically accessible from outside the nodes themselves. If you’re trying to access your Kubernetes service from outside the cluster, Kubernetes assigns a dynamic port. Typically in the range of 30,000 to 32 [crosstalk 00:34:23] proxy also then creates rules for any traffic coming into the ports from outside.
Karthik P.: On one of those ports that Kubernetes has assigned to be dynamically enacted into one of the port IPs. And if the port happens to be on a remote node from which the external traffic is coming in, kube-proxy will also automatically [inaudible 00:34:46] the traffic so that the return traffic also ingresses the same way. Ultimately, what the function of a service is is to provide a well defined connectivity abstraction in front of pods. Ultimately, pods themselves are dynamic and transient. So in Kubernetes, there’s four different service types in common use.
Karthik P.: So I’ll ignore none for now because that’s not very commonly used. The default in Kubernetes when you create the service is to use something called as a cluster IP. In effect, what happens is Kubernetes assigns an IP from a particular range called the service cidr range. Keep in mind, the service cidr range needs to be distinct and different from the port or cluster side arrange, which is used for pods. And in effect, as a service gets created, Kubernetes will assign one IP address from that service cidr range for that particular service.
Karthik P.: And that particular IP is called the cluster IP. Kube-proxy notices this and dynamically will create the rules in IP tables or IPVS. The more widely deployed implementation of Kube-proxy uses IP tables, but there’s also a new implementation that uses that IPVS. And we’ll actually explore more. The other type of service is, like I said, a node port when you need to access the service from outside the cluster. And yet, another type of service is something called as a load balancer.
Karthik P.: And in this case, Kubernetes supports a number of well known cloud load balancers. Where in addition to creating the cluster IP, Kubernetes also assigns an external IP. And can also talk to one of these cloud load balancers to be able to dynamically create the load balancing rules on this external load balancer so that those external load balancers send traffic to the appropriate node. Where the kube-proxy and the node comes in the correct external IP addresses and the nagging from that, as well. So let’s go take a look at how this …
Karthik P.: Actually, before we get to dig into kube-proxy, kube-DNS is ultimately how you would do a service lookup in Kubernetes, how you would discover services. The way this works is when you bootstrap the cluster, normally you have an implementation that’s providing DNS in your cluster. In the current versions of Kubernetes, that happens to be core DNS, which is the implementation for kube-DNS. So, in this case, what you have is a handful of pods. In my example, I’m showing two pods, which are running the core DNS application, providing the DNS server functionality.
Karthik P.: And they are backed by a particular service IP. In this case, I’ve assigned the IP address at 10.96.0.10, right? So ultimately, these multiple kube-DNS/core DNS pods are backed by a well defined service IP. So anyone in the cluster that does a lookup at this particular service IP will get a DNS resolution of what they’re looking for. The second thing that happens in Kubernetes from a DNS perspective is I have pods come up within the ETC resolv.conf inside the pod.
Karthik P.: Kubernetes automatically sets up the current values for the right named server to use. So in this example, the name server is given, the service IP for kube-DNS, which is 10.96.0.10. And also, it sets up the correct search part to use. So if you notice, the search part now includes the particular name space that this pod is running on, .service, .cluster, or .local, followed by service.cluster.local, followed by cluster.local. So when an application in the pod does a lookup of a service name without giving it a fully qualified service name, in effect, kube-DNS, but also suffixes that service name with the fully qualified domain name for that particular name space, .service.cluster.local.
Karthik P.: And save that request to core DNS. And so core DNS is actually able to point the current service IP for that particular name space and the service within that name space. Secondly, if that particular service is on a remote name space, then ultimately, that search part can help find the service on the remote name space. But by default, you’re always looking up the local name space first. And fundamentally, what happens with core DNS and kube-DNS is the resolution gets set up such that as services get created the IP address for the service, or the cluster IP, or other service discovery artifacts, can be discovered using DNS.
Karthik P.: DNS doesn’t have to be the only way you do service discovery, but it also a very common way that people use the service discovery in Kubernetes. So let’s go take a look at my cluster again to see how this actually works in practice.
Karthik P.: [silence 00:40:16]
Karthik P.: All right, so coming back to my cluster here in this particular case, I have my Centos part, which is a simple client pod, which I’m using, but I have two NGINX pods, which are both being fronted by each service. So if I do a kubectl, get service dash NS1 dash open. Notice I have a service called open NGINX that I’ve created mapped to these two pods. It happens to be a node port, so it’s got a port in that particular range. In this case, 30,000 [inaudible 00:40:52]
Karthik P.: But, also, there is a cluster IP that’s assigned. And notice that the cluster IP range, 10.101, comes from a 10.96.0.0 range, which is completely distinct from the 10.23 port range. So you have to make sure that your port cidr range and your service cidr range are completely distinct. So port cidr and service cidr both are used for the ports. The service cidr is what is used for the services. And then the service cidr range happens to be the 10.96.0.0/14 range, which is what this address gets assigned from.
Karthik P.: Now, let’s look at the service connectivity again. First of all, in this particular cluster, I’m using the IPVS kube-proxy for the kube-proxy implementation. And so if I do an IPVS ADM-LN, and I also status sticks, notice that what I had here is IPVS is telling me that, look, here’s my IP address and ports. In this case, the service IP that’s been assigned. In this case, it’s a node port, traffic port 30,080. Similarly, if I look at the specific cluster cidr that’s been assigned, 10.101.204, you will find the particular service site.
Karthik P.: There it is. So you find the 10.101.204 and it gets mapped dynamically to one of the pods, which are backing that service. And ultimately, I get status sticks for input and output bias because this is being used for load balancing among the different pods using a variety of IPVS load balancing algorithms. In my case, I’ve just set up round robin. Ultimately, traffic from individual clients are load balanced among the different destination pods.
Karthik P.: But what IPVS does is it sets up that using the [inaudible 00:42:51] mechanism in Linux to be able to not that service IP pods backing it. So on the wire, when you see a pod reaching out to a service IP destination, what you’ll ultimately see is the source IP address being the client pod’s IP. And the destination IP on the wire will be one of the destination pod IPs. You will not see the destination service IP on the wire. And ultimately, the traffic goes from pod to pod.
Karthik P.: So you’re following the existing steps I already described for troubleshooting pod connectivity. So now, if I do a kubectl exec into my Centos port and I try to do a few things. First of all, I’m going to do a curl on the service name. So it is open.NGINX. It’s NGINX open, I forget. The service name was open NGINX. Sorry, open NGINX. And if I do that, guess what? It worked. I’m able to connect to NGINX. So what happened there? First of all, I did a curl with the service name.
Karthik P.: The fact that I used the name and not an IP address tells me that DNS is working. How did it work? Let’s do a cat ETC resolv.conf within the pod. What you see here is the name server’s cluster IP is actually used as a target. So in other words, my pod has tried to talk to the cluster IP of the DNS name server. That cluster IP gets mapped to one of the core DNS pod IPs. So that mapping already worked. And then beyond that, that result came through.
Karthik P.: So core DNS was actually able to correctly resolve the IP and send the actual IP. So if I do a data install … And you wouldn’t normally install these inside a pod. I’m just doing it for the sake of showing you how this works. Okay.
Karthik P.: Actually, let me see if I can … I should’ve used the back of the client pod to trouble [crosstalk 00:45:15] DNS lookups did work. So if you actually looked at the DNS response that you get when you look up open NGINX in suffixes by this particular name space, .service.cluster.local. The response back from the kube-DNS, i.e. core DNS, is to give you the service IP that is being assigned to that particular service. In this case, it will be the particular service IP for the NGINX open [inaudible 00:45:45] service, which is the 10.101.204.227.
Karthik P.: And then the next step was then I did a curl. Ultimately, that is what gets looked up. And then this particular pod, the Centos pod, sends traffic to the 10.204 address, which gets mapped into the pod IP of one of the NGINX pods. In this example, it would be one of these. And then you would see the traffic flow on the wire with the source of the Centos pod and with a destination of this particular pod. And if you want to create the troubleshoot if that connectivity doesn’t work, ultimately, you’re going back to pod IP connectivity troubleshooting.
Karthik P.: Which is by doing things like IP route show, making sure that the routes exist. So in this case, it’s my destination is 10.23.186.196. This is the particular route that will be used. So that’s how you do connectivity troubleshooting for cluster IPs. In this case, if I was reaching out to the node port, ultimately in the node port case you’re sending traffic to the host IP. And in this case, it’s a destination that is this particular host. The traffic will be destined to that particular IP address, but for 30,080. And ultimately, kube-proxy simply sets up the rules in IPVS.
Karthik P.: Now, one idea to add to that is that kube-proxy’s IPVS implementation still uses IP tables for things like node ports. And there’s specific reasons for that. So if I do an IP tables save-TNAT and packet the less, notice that there are still some routes that are being created by kube-proxy for things like node ports. And specifically, these are to allow Kubernetes to use the best facilities available in the Linux kernel for a given function. IPVS is great for load balancing. It is not great for things like SNAT. So that’s one reason why you’ll find that kube-proxy still uses IP tables for SNAT, for things like node port.
Karthik P.: But at the same time, the IPVS implementation always leads to a fixed set of rules independent the number of services. So it still has some really good scalability characteristics, even as you get to many tens of thousands of services. Now, that is the IPVS implementation of kube-proxy. Let’s look at now the IP tables implementation of kube-proxy, which is an alternate implementation, which I have running in my other cluster.
Karthik P.: In this case, the way kube-proxy sets up rules is using the NAT tables and IP tables. So if I do an IP tables save, we can make this bigger. -DNAT and I’ll packet to less. Notice that in this case, kube-proxy has created a number of rules. So the first thing it does is kube-proxy will first set up a jump on the output chain to talk to kube-services. And within kube-services, if you look at what this does, for every service that is running in your cluster it creates a set of rules saying point to a particular other chain called kube-service chain.
Karthik P.: And picks up also the masquerade rules when masquerading is required. So as an example, notice that for the service that I just created, which is the NGINX open service, I have these two rules that says here’s setting up my masquerading rules and the jumps and also I am going to do a jump to this particular kube-service chain. And so ultimately for every service in my cluster, including for the kube-DNS service, including for the services that users create, like the one we created for NGINX open. You’ll find those two rules and then a jump.
Karthik P.: And then if you find that jump, in this case, for the NGINX open, the jump is to, let’s say, kube-service, this particular chain. So if you go look for that particular chain, which is right here, notice I have two rules, as well. What that first rule is saying is I’m going to do this load balancing using a random with a probability of 50%. And for 50% of the cases, I’m going to jump to this kube-SEP with this ID, which is randomly generated.
Karthik P.: And for the remaining 50% of the cases, I’m going to jump to this other chain called the kube-SEP with a different ID. And if I go look at what those two chains are saying, ultimately XAD. So here is the second chain. What this is saying is for 50% of traffic I’m sending traffic to this one port that is backing the service. And for the remaining 50% of time, I am sending traffic to this other pod, which is backing the other service. So this is how kube-proxy does very, very rudimentary load balancing, by using a probability.
Karthik P.: And ultimately, your traffic simply flows eventually to a pod. So if you’re troubleshooting to your proxy, for example, if you’re trying to reach a particular service IP and you’re not able to reach the service IP, what you’ll be doing is going through the sequence of one, two, three, and four jumps for every service to see is traffic reaching that particular pod. A good way to troubleshoot that in practice, I’ll just give you the service tip here, is you can use commands like watch-D IP tables VNL-D on the NAT table, of course, which is that two proxy [inaudible 00:51:56] rules.
Karthik P.: And I’m going to prep with additional lines. Actually, let me do one thing here. I mean, the chain name. Let’s say I’m troubleshooting this particular chain. The chain name happens to be this one right here. So I do a watch-D … Prep the two lines. Whoops. There we go. There’s a typo there. And in effect, as packets come and go into the service, you will see this value change. So if I remove this value and just show you the entire thing without the grip, [inaudible 00:52:59]
Karthik P.: So in effect, you will see the values change as packets come and go into the individual chain. Let me actually remove the grip altogether so you see how this will view in a typical environment. And as packets change, you’ll see the packets cause [inaudible 00:53:17] and because I’ve done a difference in the watch command, you see where packets are coming in. So you need to go back and troubleshoot actually what chains are getting packets and which are not. And [inaudible 00:53:28] you can do this in production at scale, that we’ll come back and walk through in the next webinar, when we talk about things like policy, and being able to analyze and understand things like flow logs, as well.
Karthik P.: But at the raw level, you’re also able to get this status sticks using just raw IP tables. If you’re doing the IPVS kube-proxy, you can do this using the IPVS ADM with the stats. And so you’ll use the watch command to see where the packets are coming in, which chains are hitting it. And if you’re not able to get to a destination service IP, i.e. NAT is not working correctly, then you can go back and troubleshoot the corresponding chains on that particular IPVS. So IP tables rule. So let’s come back to summarizing troubleshooting here.
Karthik P.: [silence 00:54:24]
Karthik P.: My browser just came back. All right, so now what we’re seeing is ultimately when you have to troubleshoot service connectivity, what are the different places you can look at this? First of all, the pods that are backing a given service helping. In other words, if you have a service that is being backed by three pods, Kubernetes is constantly doing likeness and readiness checks based on the likeness and readiness checks that have been defined by the application in the manifest to make sure that the pods are healthy. If the pods are not healthy, Kubernetes will kill them and try and restart them. But for whatever reason, sometimes the application fails, or there might be other factors where the pods cannot be restarted.
Karthik P.: Maybe you run out of capacity, maybe you don’t have enough memory or CPU for Kubernetes to restart the pods. So it’s possible that none of the pods in your service are healthy. [inaudible 00:55:16] is if you do a kubectl and get service, and kubectl, describe pod. See if the pods in the service are healthy. If they are healthy and the service IP is still valid, then the next thing you can check is kube-proxy working correctly. And the way you would do that is by looking at the kube-proxy logs, see if it is correctly [inaudible 00:55:40] the rules for the service.
Karthik P.: And part of that is also making sure that it is being able to correctly write to IP tables or IPVS to create the correct rules. And again, those should be logged using the kube-proxy log if it is unable to create the rules in IPVS or IP tables. Now, ultimately, the fourth thing you will check is is your IP tables and IPVS working correctly. So in other words, when a source pod is trying to reach that destination service, that those naturals are happening correctly.
Karthik P.: And for that, ultimately what you’re doing is you’re keeping track of IP tables using techniques like tracing in IP tables or you’re using … In the case of IPVS, you’re also keeping track of IPVS logs, which are also both IPVS and IP tables also logged to the kernel logs. So you can actually log this and get some errors in the kernel if either of them have errors. So just wrapping up here, other things you can have, there’s a number of tools. What I showed you so far is how you can do this imperatively, where you are going in to doing a few ctl exec, you are pinning, you are trace routing, you are actually logging into the host and doing an IP route show, you’re doing a PCP [inaudible 00:56:52] you’re doing a wire shock.
Karthik P.: There are tools that make this much more effective in actually production of scale. And especially dealing with this involves how do you as a [inaudible 00:57:11] operator make this much more declarative, where end users can troubleshoot connectivity problems before it happens. Which are utilizing images or [inaudible 00:57:27] integrates with kubectl to make this whole thing much easier. Also, for Calico related problems, we also recommend the calicoctl demand, things like calicoctl node status and other tools like that will show you when there’s connectivity issues.
Karthik P.: So I also recommend using calicoctl. And last thing I will mention is if you want to do this as a cluster operator and make this troubleshooting information much more evident to use as a scale, also come talk to Tigera because we do have other products, like Tigera Secure, that are much more focused at operators, to help operators be able to troubleshoot this at scale in production. And provide the right level of visualization tools and visibility to be able to give flow information out to end users. So with that said, I am going to stop. And in the four minutes left take any questions that we have.
Karthik P.: Also, if there are any questions, if you do want to follow up, reach out to us [inaudible 00:58:25] the Tigera, the IO website, or email us at firstname.lastname@example.org and we’d be happy to take any follow up questions that may [crosstalk 00:58:32]
Michael: Yeah. So if you guys have any questions, please ask them quickly. Unfortunately, the webinar is going to auto shut off in three minutes. So if this webinar does shut down, that’s why. Something we can’t change once the webinar is started. So three minutes for questions, otherwise, as Karthik said, we’ll take questions on our website, or by email.
Karthik P.: Yeah.
Michael: And also, as you mentioned, there will be another session of the covering of the topic, so we can certainly review some of this, as well.
Karthik P.: Yeah. No, the questions that come up on the chat, there’s a question here saying if we have two pods running on the same node, will they talk on a VXLAN tunnel or over just raw IP? Ultimately, if you have pods on the same node, ultimately, you’re using the Linux kernel for the routing. And since you have routes in the Linux kernel that have slash 32 routes for each pod, ultimately everything in Calico is a routed hop. Even from pod to pod on the same host will still go through the underlying host.
Karthik P.: But it’s not going to be encapped across VXLAN because, keep in mind, these are running on the same host. We have slash 32 routes for the sourcing of the connection part. And it’s still a routing hop. It’s the Calico’s network policy, but it doesn’t necessarily encap into VXLAN because ultimately VXLAN is being used for encapping in between nodes. Hopefully that answers your question. Happy you folks have questions. If not, we will …
Michael: We’ve only got a minute left. So I think at this point, I’d like to thank you guys all for attending this webinar. This was great information. Thank you so much, K.P., for presenting. And do check back, this presentation will be made live. And we can have copies of it sent to you if you like. And, well, enjoy. Again, we do two a month. The next webinar is yet to be scheduled, but it will be in this month, at the end of September. So look for a notice on that. Thank you all for attending and I hope you guys have a great day. Thank you.