Networking with Your Head in the Clouds

De-’Mistifying’ Kubernetes Networking in AWS

Let’s face it, networking has its roots in the physical world. One of the benefits of running your cluster in a public cloud like AWS is that you don’t have to manage or operate the underlying physical resources. If your business is not actually running a data center, this can be a substantial benefit. However, from a networking perspective, the physical networking constructs have to be mapped, in some way, to the more amorphous cloud world. A number of models have been used to do that in public clouds, but which one(s) make the most sense for a kubernetes environment?

The problem is that, in a containerized world, the unit of compute is the container or pod hosted on the node, not the node itself.

The network that is laid out by the cloud provider tries to simplify this, but, in some cases, is still very much focused on the node, say an EC2 instance, being the unit of compute or service. With that assumption in place, the cloud providers build a network that allows those nodes to communicate with one-another and with the outside world. Building on that, there are usually a number of services such as security and load balancing that are delivered based on that model.

The problem is that, in a containerized world, the unit of compute is the container or pod hosted on the node, not the node itself. In order to address this mismatch, there are a few common networking models for container environments that are deployed in cloud infrastructures today include host-based NAT and overlays. Both of these can and do work, but both come with trade-offs.

Basically, you are sticking your fingers in your ears and trying to pretend that the underlying network doesn’t exist. Unfortunately for you, it does.

In a host-based NAT world, all of the containers running on your node (an EC2 instance in AWS) will be NATed so that they will be using the node’s address. There are a number of short-comings with this model, but from a Kubernetes viewpoint, the two most relevant ones are port collisions (two pods on the same node can not use the same port, say port 443 for a web service) and loss of pod address. Kubernetes is designed to give each pod a unique address and there are a number of functions that assume each pod will have that unique address (such as kube-proxy, many network policy implementations, etc). This is probably not the way to implement the network for a cloud-hosted kubernetes cluster.

The other common approach is to rely on an overlay network. In this model, you layer a network on top of the cloud providers network to which all of the pods are attached. Basically, you are sticking your fingers in your ears and trying to pretend that the underlying network doesn’t exist. Unfortunately for you, it does.

At best, the only really negative impact is that you are adding another layer of network encapsulation on top of however many layers of overlay encapsulation your cloud provider is doing. It’s turtles, all the way down.

However, it’s fairly likely that you need to get into and out of your cluster. Now you need to link your overlay to the network you are trying to avoid. That usually becomes a congestion point and possibly even a single point of failure. It certainly can be sub-optimal from a network traffic standpoint. Finally, you now need to troubleshoot two networks, the provider’s and yours. Isn’t one of the key reasons for deploying at a cloud hosting service supposed to reduce the number of things you have to worry about, not increase them?

A Saner Approach

How about actually cooperating with the AWS network infrastructure, as much as possible? Project Calico allows you to do just that by working within the infrastructure as much as possible.

Let’s take a standard example where you have a Kubernetes cluster in AWS that spans 3 availability zones within a single VPC.

Each availability zone in a VPC looks like a contiguous subnet, which means that all the calico nodes within that AZ should be able to directly connect to each other. However, the VPC routing still checks IP addresses, by default. Since the VPC has a limit of how many routes it can have in its routing table, we can’t tell the VPC about the pod’s addresses. So, you need to turn off that address checking. Amazon has published instructions on how to do that here. A synopsis, however, is to do the following:

To disable source/destination checking using the console

Open the Amazon EC2 console at http://console.aws.amazon.com/ec2/.
In the navigation pane, choose Instances.
Select the NAT instance, choose Actions, select Networking, and then select Change Source/Dest. Check.
For the NAT instance, verify that this attribute is disabled. Otherwise, choose Yes, Disable.

Once you have done that, all of the Calico nodes will establish direct network connections between one another within each AZ. Traffic will be flowing without the need for any overlay. Hoorah!

But the idea is to have a single cluster that spans multiple AZs. Since those are separate subnets, there needs to be a router between them, which Amazon provides. However, that router can not handle the number of routes that are necessary in a Kubernetes cluster. No fear, however, Calico can fix that.

You need to switch on IP-IP encapsulation, but only for traffic that crosses subnet boundaries. This has been supported in Calico for a while. You do that change the configuration of your Calico pool. The instructions for doing so can be found in the Calico documentation set. The key is that you set the ipip enabled flag, and set it to use the cross-subnet mode. An example yaml file to do that is below:

$ calicoctl apply -f - << EOF
apiVersion: v1
kind: ipPool
metadata:
  cidr: 192.168.0.0/16
spec:
  ipip:
    enabled: true
    mode: cross-subnet
EOF

Now, Kubernetes traffic within a single AZ is native (not encapsulated), and traffic between AZs is encapsulated over the AZ boundary — all automatically.

Lastly, you may need to access resources outside of your cluster, say on the public Internet. Since your cluster is probably using private address space, that egress traffic will have to be NATed. Again, Calico has an automatic mechanism to do that, where the NAT is performed on the node itself, automatically. To enable that feature, again, look at the Calico documentation, but the yaml configuration statement is listed below:

$ calicoctl apply -f - << EOF
apiVersion: v1
kind: ipPool
metadata:
  cidr: 192.168.0.0/16
spec:
  ipip:
    enabled: true
    mode: cross-subnet
  nat-outgoing: true
EOF

At this point, your Kubernetes cluster is communicating within the cluster in the most efficient manner possible, and is able to reach the outside world. Your connectivity issues are all sorted.

What About Security?

Now that you have your pods talking to each other, and the outside world, you probably want to start securing them. I’ve blogged on the concept of micro-segmentation in a container world in a two part series here and here, so I won’t repeat myself. However, the key thing to keep in mind is that Project Calico’s network policy capabilities are fully functional, no matter how it is deployed. That means that all of the fine-grained, application centric policy mechanisms that Calico is known for are now available to you by using Calico as your networking layer for Kubernetes on AWS.

Looking Forward

A number of our customers are looking at AWS’s Transit VPC to interconnect multiple VPCs, and often their on-premise infrastructure as well. Transit VPCs use the same routing technology (BGP) that Calico does, meaning that Calico can run natively over that transit infrastructure even back to your private cloud.

It is almost a certainty that Amazon will continue its pace of innovation in the cloud infrastructure space, so one can only assume that the networking capabilities of AWS will continue to evolve, and possibly become even more container-aware than they already are. Due to the flexibility of Calico’s networking model, it is almost guaranteed that even if the AWS networking model changes, your Calico installation, and all of the policy enforcement that you have come to rely on will still continue to work, on AWS now, on AWS in the future, and wherever else you decide to deploy Kubernetes.