How to maximize K3s resource efficiency using Calico’s eBPF data plane

Amazon’s custom-built Graviton processor allows users to create ARM instances in the AWS public cloud, and Rancher K3s is an excellent way to run Kubernetes in these instances. By allowing a lightweight implementation of Kubernetes optimized for ARM with a single binary, K3s simplifies the cluster initialization process down to executing a simple command.

In an earlier article, I discussed how ARM architecture is becoming a rival to x86 in cloud computing, and steps that can be taken to leverage this situation and be prepared for this new era. Following the same narrative, in this article I’ll look at an example of the Calico eBPF data plane running on AWS, using Terraform to bootstrap our install to AWS, and Rancher K3s to deploy the cluster.

A few changes to Calico are needed for ARM compatibility, including updating parts, enabling eBPF, and compiling operators for the ARM64 environment:.

  • Tigera Operator Tigera Operator is the recommended way to install Calico.
  • go-build go-build is a container environment packed with all the utilities that Calico requires in its compilation process.
  • Calico-node Calico-node is the pod that hosts Felix (i.e. it is the brain that carries control plane decisions fto your cluster).

Let’s discuss how to run an ARM-powered Kubernetes cluster equipped with Calico for security and eBPF as the data plane.

What is eBPF?

eBPF is a virtual machine embedded within the Linux kernel. It allows small programs to be loaded into the kernel, and attached to hooks, which are triggered when some event occurs. This allows the behavior of the kernel to be customized. While the eBPF virtual machine is the same for each type of hook, the capabilities of the hooks vary considerably. Since loading programs into the kernel would otherwise be a security risk, the kernel runs all programs through a very strict static verifier; the verifier sandboxes the program, ensuring it can only access allowed parts of memory and ensuring that it terminates quickly.

Calico’s eBPF data plane makes use of BPF functionality to allow source IP preservation, Direct Server Return (DSR) and even better performance. A full explanation of eBPF and its capabilities is out of the scope of this article, but if you are interested in learning more you can check out this post, which talks about eBPF capabilities and why you should use it.

Performance

Calico eBPF data plane is an alternative to Calico’s standard Linux data plane (based on iptables) that pushes the performance limits further. Calico leverages BPF programs to process packets rapidly and efficiently without ever leaving the packet processing context of the Linux kernel. The efficiency achieved by this method is close to natively compiled code in the kernel.

Benchmarks can help determine if a technology can live up to its promises. Although numbers might differ in various environments, the end result should be similar in most cases.

Both data planes are capable of pushing data close to the maximum available bandwidth of the wire. In this case, after 10 seconds, we can see the maximum bandwidth was reached.

Note: AWS has a limit on single flows of traffic between EC2 instances that restricts this type of traffic to 5Gbps.

The chart below shows the total CPU utilization consumed by the benchmarks, measured in vCPUs. This is the sum of both client and server CPU utilization.

These statistics were collected using two A1.xlarge instances, this K8s benchmark suite, and multiple instances of iperf with parallel connections to each other. Note that AWS has a limit on single flows of traffic between EC2 instances that limits this type of traffic to 5Gbps.

Demo

This section provides you with the steps necessary to deploy a Rancher K3s ARM-based cluster equipped with Calico’s eBPF data plane on AWS public cloud infrastructures.

Before we begin

This demo section uses A1 instances from Amazon. In addition to an Amazon account, make sure the following applications are installed on your system:

Cluster preparation

Use the following command to download the Terraform template cluster:

git clone https://github.com/frozenprocess/demo-cluster.git

Browse to the calico-k3s-aws folder within demo-cluster:

cd calico-k3s-aws

Inside the calico-k3s-aws folder, you will find an example variable file called terraform.tfvars-example. Removing the example suffix from this file provides an easy way to modify some of the attributes of the cluster that will be deployed using Terraform.

For this example, since our document focuses on setting up an ARM environment, you need to modify the following values accordingly:

availability_zone_names = ["us-west-2a","us-west-2c"]
image_id = "ami-06d1fcb7a93046a55"
instance_type = "m6g.large"
Note: At the time of writing, AWS us-west-2 region only offers ARM64 instances in us-west-2a and us-west-2c availability zones.

Terraform uses “providers” in order to connect to a variety of environments. Use the following command to download the “providers” related to this demo:

terraform init

Now that our Terraform project is ready, use apply to populate the resources. Two vanilla Ubuntu EC2 VMs, one VPC, one IGW, two subnets, and one default route will be created after running the following command.

Note: Executing the following command will create resources in your AWS Account. Take a minute to review the resources before approving the deployment.
terraform apply

There should be an output similar to the following after a successful deployment:

instance_1_private_ip = "172.16.2.183"
instance_1_public_ip = "34.219.2.61"
instance_2_private_ip = "172.16.1.81"
instance_2_public_ip = "54.187.109.181"

K3s server installation

K3s server is the process that plays the role of control plane. This installation process creates important files that are crucial in controlling and maintaining the cluster.

Use instance_1_public_ip value and the calico-demo.pem file to connect to instance one.

Note: In Windows, use PuTTY to complete this step. More information can be found here.
ssh ubuntu@34.219.2.61 -i calico-demo.pem

K3s’s single binary approach provides a nearly magical experience for installing a cluster. Simply run the following command and the cluster will be ready in a matter of seconds!

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.3+k3s1 K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="--flannel-backend=none --tls-san=k3s.local --cluster-cidr=192.168.0.0/16 --disable-network-policy --disable=traefik" sh -

Despite its tiny size, K3s is packed with a lot of features. In this demo, we are going to disable Flannel, Traefik, and K3s default network policy to simplify the process. If you would like to know more about these features, please check out this link.

Let’s check the state of our cluster using kubectl:

kubectl get pods -A

You should see a result similar to the below. Notice that all of the pods are in pending state. This is because there is no CNI installed in our cluster.

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-5ff76fc89d-pm2f5 0/1 Pending 0 23s
kube-system metrics-server-86cbb8457f-6mmbd 0/1 Pending 0 23s
kube-system coredns-7448499f4d-g6d9c 0/1 Pending 0 23s

Installing Calico

Tigera Operator is a great way to install Calico. Operators are a great way to interact with custom resources in a controlled manner.

Note: Calico ARM64 eBPF support is in its early stages. You’ll need to modify the tigera-operator manifest and add -arm64 to the end of the image: quay.io/tigera/operator:master line. In the below command, I’ve used sed to automatically do the procedure. However, if you are using Windows or having trouble with the following command, consider doing this step manually using a text editor and then applying your modified manifest using kubetl create -f <<myfile.yaml>>.

Use the following command to install Tigera Operator on the cluster:

curl https://docs.projectcalico.org/master/manifests/tigera-operator.yaml | sed 's#:master#:master-arm64#' | kubectl create -f -

It is possible to verify operator installation using the following command:

kubectl get pods -n tigera-operator

There should be an output similar to the following:

NAME READY STATUS RESTARTS AGE
tigera-operator-86c4fc874f-86x8r 1/1 Running 0 56s

Calico is packed with a lot of features and tunable to its basic components. After installing the operator, it will constantly check for a configuration in the default namespace that contains the kind: Installation header to configure Calico in the cluster.

Use the following command to begin the installation:

kubectl create -f - <<EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
bgp: "Disabled"
ipPools:
- blockSize: 26
cidr: 192.168.0.0/16
encapsulation: VXLAN
natOutgoing: Enabled
nodeSelector: all()
EOF

Use the following command to verify Calico deployment has finished:

kubectl rollout status ds/calico-node -n calico-system

There should be an output similar to the following:

daemon set "calico-node" successfully rolled out

At this point, we have a working node in our K3s cluster, equipped with Calico using the standard Linux data plane.

K3s agent installation

Adding K3s worker nodes is as easy as installing the server. Simply copy the token:

sudo cat /var/lib/rancher/k3s/server/node-token

There should be an output similar to the following:

K10d650693ae9d1c33239dee97b00c5a5f669c9921525f9b02f83c68cd7decae829::server:695a8f016aeb78245bb527f81fe42cd6

Use the instance_2_public_ip value and calico-demo.pem that was created earlier by our Terraform project, and SSH into the second instance:

ssh ubuntu@54.187.109.181 -i calico-demo.pem

Use the following command to install the agent and join the worker node to the cluster.

Note: Please change the IP address, port, and token in the following command to suit your environment. You need to specify the internal IP `instance_1_private_ip` of the first node, not the external IP.
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.3+k3s1 K3S_URL=https://172.16.2.183:6443 K3S_TOKEN=K10d650693ae9d1c33239dee97b00c5a5f669c9921525f9b02f83c68cd7decae829::server:695a8f016aeb78245bb527f81fe42cd6 sh -

Back at the server, execute the following command to verify the newly installed worker:

kubectl get nodes

There should be an output similar to the following:

NAME STATUS ROLES AGE VERSION
ip-172-16-2-183 Ready control-plane,master 7m39s v1.21.1+k3s1
ip-172-16-1-81 Ready <none> 16s v1.21.1+k3s1

Enable eBPF

By default, Calico is set to use the iptables data plane, which provides a service named kubernetes in the default namespace and proxies it to the API server using kube-proxy pods.

Since Calico can take over the kube-proxy responsibilities in eBPF mode, we can safely remove these pods. However, we have to allow Calico to directly talk to the api-server in order to prevent any interruption.

Use the following command to determine the api-server information:

kubectl get endpoints kubernetes -o wide
NAME ENDPOINTS AGE
kubernetes 172.16.1.211:6443 4m18s

By using a configmap, we can tell Calico how to directly contact the cluster API server.

Note: Address and port might differ. Make sure these values are typed correctly, since an incorrect value can trigger a crash loop for `tigera-operator` pod.

Use the following command to create the required configmap:

cat << EOF > kubernetes-services-endpoint.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: kubernetes-services-endpoint
namespace: tigera-operator
data:
KUBERNETES_SERVICE_HOST: "172.16.2.183"
KUBERNETES_SERVICE_PORT: "6443"
EOF

Use kubectl apply to apply the file and create the ConfigMap:

Kubectl apply -f kubernetes-services-endpoint.yaml

It might take 60 seconds for the configmap to be picked up by the cluster. After that, use the following command to restart the operator in order to pick up the changes:

kubectl delete pod -n tigera-operator -l k8s-app=tigera-operator

K3s embeds the kube-proxy process, which makes it hard to disable. Since both kube-proxy and eBPF are trying to interact with cluster data flow, we must change the Felix configuration parameter BPFKubeProxyIptablesCleanupEnabled to false. If both kube-proxy and BPFKubeProxyIptablesCleanupEnabled are enabled, kube-proxy will write its iptables rules and Felix will try to clean them up, resulting in iptables flapping between the two.

Export the cluster credentials file that was created by the server installation process:

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

Use the following command to disable iptables cleanup:

calicoctl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptablesCleanupEnabled": false}}'

Execute the following command to change the cluster data plane with eBPF:

kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"BPF", "hostPorts":null}}}'

Let’s verify our cluster’s health and node architecture by executing the following command:

kubectl get nodes -L kubernetes.io/arch
NAME STATUS ROLES AGE VERSION ARCH
ip-172-16-2-244 Ready <none> 65s v1.21.3+k3s1 arm64
ip-172-16-1-224 Ready control-plane,master 2m21s v1.21.3+k3s1 arm64

That’s it! You now have a multi-node K3s cluster secured with Calico that uses the eBPF data plane.

Clean up

In the demo folder, issue the following command to remove all of the resources created for this demo:

terraform destroy -auto-approve

Conclusion

K3s’s minimalistic approach makes it invaluable for resource-constrained environments, and its single binary approach allows for faster deployment across any infrastructure. Users can push this resource efficiency even further by using the Calico eBPF data plane, while also benefiting from Calico’s other features such as its feature-rich network policy language, BGP routing, and more.

In this article, I explored how to install a multi-node K3s cluster equipped with Calico, and how to swap the standard Linux data plane with eBPF. I’ve only scratched the surface of Calico’s eBPF capabilities here. For more information, take a look at our eBPF documentation.

Did you know you can become a certified Calico operator? Learn Kubernetes and container networking and security fundamentals using Calico in this free, self-paced certification course.

Join our mailing list

Get updates on blog posts, new releases and more!