Calico eBPF Data Plane Deep-Dive

Why Take the Lid Off if it’s Working?

Sometimes the best way to understand something is to take it apart and see how it works. This blog post will help you take the lid off your Calico eBPF data plane based Kubernetes cluster and see how the forwarding is actually happening. The bonus is, unlike home repairs, you don’t even have to try to figure out how to put it back together again!

The target audience for this post is users who are already running a cluster with the eBPF data plane, either as a proof-of-concept or in production. Therefore, we will not go through the steps to set up a cluster from scratch. If you would like to learn how to do that, the best starting point is this documentation.

In the best case and likely scenario, you will have no data plane issues in the future and this knowledge will still help you to make informed decisions about the Calico eBPF data plane and your future clusters, and how to get the best from them. Knowledge is power!

If you are unlucky enough to experience future issues, being armed with a good understanding of the underlying technologies will help you to quickly grasp what might be going wrong and where to look for further clues, in order to resolve your own issue or help community members get the information they need to support you.

Therefore, this blog aims to:

  • encourage familiarity and understanding about the Calico eBPF data plane.
  • teach techniques for basic visualisation/learning/troubleshooting of the Calico eBPF data plane.

So without further ado, let’s take the lid off a cluster running the Calico eBPF data plane, brush aside the magic sprinkles, and see what’s really going on in there.

Checking The Basics are Right

Before we jump in too deep, we check the water; so, let’s make sure the basics look right. Since this post assumes you already have a running cluster with the Calico eBPF data plane, we won’t check every aspect of the prerequisites and configuration here. We will only check those aspects that would allow the cluster to work superficially, but in a degraded or non-performant state. Checking these basics is also a great way of confirming your understanding.

For your cluster to run the Calico eBPF data plane, you need a supported Linux distribution and the eBPF file system must be mounted at /sys/fs/bpf. For both of these requirements, things may appear to work even if the requirement is not met, because:

  • If Calico does not detect a compatible kernel, it will emit a warning and fall back to the standard Linux networking data plane.
  • The Calico eBPF data plane will generally appear to work okay without the file system mounted, but if the file system does not persist then pods will temporarily lose connectivity when Calico is restarted and host endpoints may be left unsecured (because their attached policy program will be discarded).

Since these requirements to check are on the nodes, I first need to SSH to the nodes. For brevity, I only do that on the master node, here. First, I need the master node IP:

kubectl get nodes -o wide | grep -i master

Returns:

chris-bz-5vp4-kadm-ms        Ready    control-plane,master   144m   v1.20.8   10.128.0.65    34.132.164.249   Ubuntu 20.04.2 LTS   5.8.0-1035-gcp   docker://20.10.2

So next, I can SSH to the public IP. In my cluster, the username is ubuntu, so:

ssh ubuntu@34.132.164.249

Which leads to:

Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.8.0-1035-gcp x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 13:16:23 UTC 2021

  System load:  1.35              Processes:                143
  Usage of /:   7.6% of 48.29GB   Users logged in:          0
  Memory usage: 16%               IPv4 address for docker0: 172.17.0.1
  Swap usage:   0%                IPv4 address for ens4:    10.128.0.65

10 updates can be applied immediately.
5 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable

Last login: Wed Jul  7 13:15:41 2021 from 51.6.186.139

Now, I can check the command responses from the node show a supported kernel version and the bpf file system mounted appropriately, and then disconnect:

uname -rv

Returns a valid kernel:

5.8.0-1035-gcp #37~20.04.1-Ubuntu SMP Thu Jun 17 16:04:29 UTC 2021

And:

mount | grep "/sys/fs/bpf"

Returns the expected output:

none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)

Now, I can disconnect:

logout

And:

Connection to 34.132.164.249 closed.

The next issue that could allow your Calico eBPF data plane based cluster to function in a sub-optimal state would be an incorrect configuration with relation to kube-proxy. You might recall that this data plane brings kube-proxy “in-house” – that is to say, it takes over kube-proxy’s responsibilities and re-implements them in the data plane’s eBPF hooks.

The most common outcome of an incorrect implementation in this sense is high CPU utilisation. Since the data plane is implementing kube-proxy’s functionality, you must make sure that preferably kube-proxy is fully disabled:

kubectl get pods -n=kube-system | grep proxy | wc -l

This should return 0 (this command simply prints a list of all running kube-proxy pods and then counts the number of lines of output). Depending on the cluster type, the right technique for disabling kube-proxy varies.

In fact, some clusters cannot disable kube-proxy. One example is K3s clusters, since K3s is packaged as a single binary, so the kube-proxy functionality cannot be turned off. If so:

calicoctl get felixconfiguration default -o yaml

In my case, kube-proxy is off, so cleanup is still enabled:

apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: FelixConfiguration
  metadata:
    creationTimestamp: "2021-07-07T10:56:31Z"
    name: default
    resourceVersion: "211197"
    uid: 073ffcbb-2e57-46cc-ac85-7a04a1310f27
  spec:
    bpfKubeProxyIptablesCleanupEnabled: true
    bpfLogLevel: ""
    logSeverityScreen: Info
    reportingInterval: 0s
    vxlanEnabled: true
kind: FelixConfigurationList
metadata:
  resourceVersion: "238360"

Finally, looking again at the output above, you should use the VXLAN encapsulation, as in my example, or no encapsulation at all if your network allows it. This provides the best performance with the Calico eBPF data plane.

If your Calico eBPF data plane is fundamentally working, but performance remains sub-optimal or CPU utilisation remains high after reviewing the above, there are further troubleshooting steps here.

Let’s move on to a theoretical packet flow, then a review of the available visibility and diagnostics tools and finally, a real packet walk demo.

A Theoretical Packet Flow Through a Calico eBPF Node

This excellent diagram (courtesy of Jan Engelhardt) helps to visualise the flow of network packets through Netfilter on a single Linux host (including on Kubernetes nodes). You can see a much larger version here. In the diagram, iptables packet filtering is included.

The standard Calico Linux/iptables data plane is implemented mostly in the green boxed area, since it is a layer-3 routed data plane. In contrast, at the time of writing the Calico eBPF data plane is primarily implemented as code attached to the eBPF hooks shown in the diagram, at the nodes labelled:

  • ingress (qdisc)
  • egress (qdisc)

Packet flow in Netfilter and General Networking

A qdisc (short for queuing discipline) is a scheduler component of the Linux Traffic Control Framework (tc).

The framework was originally designed for traffic shaping. Its main data structure is a tree of qdiscs. Some qdiscs support attaching eBPF programs; this was originally intended for advanced packet classification but the current implementation can do a lot more, including modifying the packet, redirecting it or dropping it.

Since not all qdiscs support eBPF, the kernel provides a no-op qdisc called clsact (classifier and action), which is solely for attaching an eBPF program. If no eBPF program is attached, this qdisc doesn’t do anything. It can coexist with other qdiscs. Calico establishes a suitable qdisc is in place and then attaches eBPF programs, at:

  • node ingress
  • node egress

The programs that are attached are eBPF programs compiled at build time. Calico’s per-node agent, Felix, assembles policy programs at runtime. The pre-compiled programs jump to the policy programs.

Node ingress, before the iptables pre-routing hooks:

IngresseBPFHooks

Node egress, after the iptables post-routing hooks:

EgresseBPFHooks

Whilst the iptables hooks interleave throughout the pipeline, the eBPF hooks come at the beginning and end of the pipeline only. This leads to the requirement for the Calico eBPF data plane to replace kube-proxy’s functionality. Fortunately, this also gave the opportunity for the team to actually improve on the functionality, in particular by adding source IP preservation.

The first packet for each flow will be processed as usual through netfilter and general networking. Calico converts your policy into optimised eBPF bytecode, using eBPF maps to store the IP sets matched by policy selectors.

Flow setup:

FlowSetup

Established flow:

EstablishedFlow

The logic to implement load balancing and packet parsing is pre-compiled ahead of time and relies on a set of eBPF maps to store the NAT front-end and back-end information. One map stores the metadata of the service, allowing for externalTrafficPolicy and “sticky” services to be honoured. A second map stores the IPs of the backing pods.

qdiscs exist on all layer 2 interfaces. Therefore, the flow behaviour described above applies to all layer 2 interfaces, including the veth interfaces currently used for networking pods. Felix attaches ingress and/or egress programs to “cali”, “data” and “tunnel” interfaces where needed. Other interface types are handled as exception cases, in particular:

  • IP-in-IP tunnel interfaces
  • Wireguard interfaces

The diagram below helps to illustrate some of the locations at which Calico eBPF implements eBPF programs.

tcBPFHooks

Available Tools for Rough and Smooth Days (Calico eBPF Diagnostics and Visibility)

There are several tools available to help understand and diagnose packet flow in Calico eBPF data plane based Kubernetes clusters. All of these will be shown later in this post in practise on a real cluster, but first a quick summary:

The calico-bpf tool:

This tool formerly needed to be built and run by the administrator on the node. Now, it is included in the cnx-node container image used to build the calico-node pods, for easier use.

Since eBPF maps contain binary data, the Calico team wrote this tool to examine Calico’s eBPF maps. The tool allows the user to view and manipulate arp, connect-time load balancing programs, connection tracking, ipsets, nat, and routes.

Using the tool will be demonstrated later in the blog. However you can run it as follows. First get a calico-node name from your cluster:

kubectl get pod -A | grep calico-node

This returns the following, on the demo cluster:

calico-system calico-node-4dz5h 1/1 Running 0 5d2h
calico-system calico-node-4jrrr 1/1 Running 0 5d2h
calico-system calico-node-cwh85 1/1 Running 0 5d2h
calico-system calico-node-f6rdq 1/1 Running 0 5d2h
calico-system calico-node-p6qgz 1/1 Running 0 5d2h

Then, run the calico-bpf tool on whichever node you are interested in, as follows:

kubectl exec -n calico-system calico-node-cwh85 -- calico-node -bpf help

Which returns more information on how to use the tool:

tool for interrogating Calico BPF state

Usage:
calico-bpf [command]

Available Commands:
arp Manipulates arp
connect-time Manipulates connect-time load balancing programs
conntrack Manipulates connection tracking
help Help about any command
ipsets Manipulates ipsets
nat Manipulates network address translation (nat)
routes Manipulates routes
version Prints the version and exits

Flags:
--config string config file (default is $HOME/.calico-bpf.yaml)
-h, --help help for calico-bpf
-t, --toggle Help message for toggle

Use "calico-bpf [command] --help" for more information about a command.

The tc tool:

To check if an eBPF program is dropping packets, a user can use the tc command-line tool. tc is a Linux tool that allows the user to show and manipulate traffic control settings. This tool is not specifically for use with Calico and can be used in many ways.

eBPF program debug logs:

Calico eBPF programs can be configured to produce detailed debug logging. The programs will log every packet, so this should not be turned on for day-to-day use, but to help specifically diagnose an eBPF program issue or to gain insights into how things are working, they can be very useful. To enable the logging, the bpfLogLevel Felix configuration needs to be set to Debug, and then tc exec bpf debug can be used as demonstrated later in this post.

WARNING! Enabling logs in this way has a significant impact on eBPF program performance.

The diagram below shows how this logging can be interpreted:

LogEntryInfo

The Example Environment

eBPFDemoExampleEnvironment

The example environment used for the next section contains:

  • 5 Kubernetes Calico eBPF data plane nodes running in GCP
  • VXLAN encapsulation and Direct Server Return (DSR) enabled
  • 1 GCP LoadBalancer service answering port 8385 (backed by a NodePort on all nodes, shown using the NodePort on chris-bz-5vp4-kadm-node-2 in all examples)
  • 1 deployment, echoserver, in the default namespace
  • 1 pod in that deployment, which has landed by chance on the node chris-bz-5vp4-kadm-node-0

This blog will review logs that use the Calico node pod name. Therefore, this table provides Calico node to Kubernetes node mappings:

Kubernetes Node Calico Node Pod Name
chris-bz-5vp4-kadm-ms calico-node-4dz5h
chris-bz-5vp4-kadm-node-0 calico-node-cwh85
chris-bz-5vp4-kadm-node-1 calico-node-4jrrr
chris-bz-5vp4-kadm-node-2 calico-node-f6rdq
chris-bz-5vp4-kadm-infra-0 calico-node-p6qgz

(T)Seeing the Real Thing on a Calico eBPF Cluster

Now that we know what the example setup used in this blog looks like, and we know what tools are available, let’s see what we can discover with each of them.

First, using calico-bpf to examine IP sets. Let’s examine the cluster’s network policy implemented in BPF. The ip-allow-policy NetworkPolicy specifies a NetworkSet called ip-allow-set that contains allowed IPs, and these are matched in a policy. Let’s see it:

calicoctl get networkpolicy ip-allow-policy -o yaml

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
creationTimestamp: "2021-07-09T10:35:15Z"
name: ip-allow-policy
namespace: default
resourceVersion: 449629/
uid: 01b6790a-07a8-4f86-aa9f-dad1621cc6b6
spec:
ingress:
- action: Allow
destination: {}
source:
selector: ip-allow-set == 'true'
order: 0
types:
- Ingress

Now, we can query the eBPF maps on all of the nodes. Recall that this is where we expect to see Calico storing the IP set for rapid use in policies implemented in the eBPF data plane.

The shortcut command I use below:

  • grabs a list of the calico-nodes running on the cluster
  • iterates over that list, printing the calico-node name of each
  • runs a command on each (calico-node -bpf ipsets dump 2>&1)
  • filters out some unwanted debugging output: (egrep -v ‘Defaulted|descriptor’)
for i in `kubectl get pod -o wide -n calico-system | grep calico-node | awk '{print $1}'`; do echo $i; echo "-----"; kubectl exec -n calico-system $i -- calico-node -bpf ipsets dump 2>&1 | egrep -v 'Defaulted|descriptor'; printf "\n"; done

Resulting in:

calico-node-4dz5h
-----
No IP sets found.

calico-node-4jrrr
-----
No IP sets found.

calico-node-cwh85
-----
IP set 0xbe0344436eda1942
51.6.186.139/32


calico-node-f6rdq
-----
No IP sets found.

calico-node-p6qgz
-----
No IP sets found.

Notice that the IP set is only consuming eBPF map resources on the node that is hosting a pod in the default namespace. In fact, after adding some extra pods in the namespace on the other nodes, and running the command again, we can see that now the IP set is present on the other nodes:

calico-node-4dz5h
-----
No IP sets found.

calico-node-4jrrr
-----
IP set 0xbe0344436eda1942
51.6.186.139/32


calico-node-cwh85
-----
IP set 0xbe0344436eda1942
51.6.186.139/32


calico-node-f6rdq
-----
IP set 0xbe0344436eda1942
51.6.186.139/32


calico-node-p6qgz
-----
IP set 0xbe0344436eda1942
51.6.186.139/32

Now let’s use calico-bpf to examine a TCP flow through the cluster. The flow that will be examined here is the flow illustrated in the “The Example Environment” section above. It should be possible to see:

  • The TCP flow from the GCP LoadBalancer to the NodePort (with the client’s real source IP).
  • The VXLAN tunnel from the node answering the NodePort to the node hosting the service pod.
  • The reply going directly back to the client (DSR).

The shortcut command I use below:

  • sets two environment variables to identify the traffic we’re interested in.
    grabs a list of the calico-nodes running on the cluster.
  • iterates over that list, printing the calico-node name of each.
  • runs a command on each (calico-node -bpf conntrack dump 2>&1).
  • filters out some unwanted debugging output: (egrep -v ‘Defaulted|descriptor’).
export EBPF_INTERESTING_IP=51.6.186.139 && export EBPF_INTERESTING_PORT=8385 && for i in `kubectl get pod -o wide -n calico-system | grep calico-node | awk '{print $1}'`; do echo $i; echo "-----"; kubectl exec -n calico-system $i -- calico-node -bpf conntrack dump 2>&1 | grep ${EBPF_INTERESTING_IP} | grep ${EBPF_INTERESTING_PORT}; printf "\n"; done

Resulting in:

calico-node-4dz5h
-----

calico-node-4jrrr
-----

calico-node-cwh85
-----
ConntrackKey{proto=6 51.6.186.139:42642 <-> 34.123.108.206:8385} -> Entry{Type:1, Created:430101698865672, LastSeen:430102010635728, Flags: <none> REVKey : ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080}} Age: 7.022722432s Active ago 6.710952376s
ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080} -> Entry{Type:2, Created:430101698864303, LastSeen:430102010645876, Flags: ext-local Data: {A2B:{Seqno:3175947981 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:true Ifindex:2} B2A:{Seqno:1243493201 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:false Ifindex:0} OrigDst:34.123.108.206 OrigPort:8385 TunIP:10.128.1.107}} Age: 7.027186531s Active ago 6.715404958s CLOSED

calico-node-f6rdq
-----
ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080} -> Entry{Type:2, Created:430101684085697, LastSeen:430101996561527, Flags: np-fwd Data: {A2B:{Seqno:3175947981 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:true Ifindex:2} B2A:{Seqno:1243493201 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:false Ifindex:0} OrigDst:34.123.108.206 OrigPort:8385 TunIP:10.128.0.118}} Age: 8.577187402s Active ago 8.264711572s CLOSED
ConntrackKey{proto=6 51.6.186.139:42642 <-> 34.123.108.206:8385} -> Entry{Type:1, Created:430101684087346, LastSeen:430101996561527, Flags: <none> REVKey : ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080}} Age: 8.580019102s Active ago 8.267544921s

calico-node-p6qgz
-----

Firstly, note that the only nodes that have recorded the flow are the two that we expected. The other nodes do not need to waste any resources recording the flow. Now, let’s massage the output a bit to make it more readable, and highlight the relevant information:

  • Pod IP/Port
  • VXLAN tunnel IP/Port
  • Client IP/Port
  • LoadBalancer IP/Port

calico-node-f6rdq
ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080}
-> Entry{Type:2, Created:430101684085697, LastSeen:430101996561527, Flags: np-fwd Data: {A2B:{Seqno:3175947981 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:true Ifindex:2} B2A:{Seqno:1243493201 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:false Ifindex:0} OrigDst:34.123.108.206 OrigPort:8385 TunIP:10.128.0.118}} Age: 8.577187402s Active ago 8.264711572s CLOSED
ConntrackKey{proto=6 51.6.186.139:42642 <-> 34.123.108.206:8385}
-> Entry{Type:1, Created:430101684087346, LastSeen:430101996561527, Flags: <none>
REVKey : ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080}} Age: 8.580019102s Active ago 8.267544921s

calico-node-cwh85
ConntrackKey{proto=6 51.6.186.139:42642 <-> 34.123.108.206:8385}
-> Entry{Type:1, Created:430101698865672, LastSeen:430102010635728, Flags: <none>
REVKey : ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080}} Age: 7.022722432s Active ago 6.710952376s
ConntrackKey{proto=6 51.6.186.139:42642 <-> 192.168.244.208:8080}
-> Entry{Type:2, Created:430101698864303, LastSeen:430102010645876, Flags: ext-local Data: {A2B:{Seqno:3175947981 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:true Ifindex:2} B2A:{Seqno:1243493201 SynSeen:true AckSeen:true FinSeen:true RstSeen:false Whitelisted:true Opener:false Ifindex:0} OrigDst:34.123.108.206 OrigPort:8385 TunIP:10.128.1.107}} Age: 7.027186531s Active ago 6.715404958s CLOSED

Note that both the ingress node hosting the NodePort and the node hosting the service pod have two ConntrackKey maps each. They are both tracking the full connection including both public IPs and the service pod’s real IP.

Now let’s see how the tc tool can help. tc can be run either within the cnx-node pod, as with calico-bpf, or directly in the host namespace of a node. This post will run it within the cnx-node pod for continuity. This also has the benefit that the administrator does not need to have access to the command line of the nodes.

Amongst many other useful functions, tc can show the drop counters for a particular interface. For example, one might be interested in seeing the drops on an interface facing a particular pod – in this case the echoserver-6558697d87-54jrg pod we used previously. To make the output easier to manage, target tc at a particular interface. One way to identify the interface of interest is to SSH to the node hosting the pod, then use netstat -nr and grep for the pod IP and note the target interface name. For example:

ssh ubuntu@35.225.233.46 'sudo apt install net-tools -y && netstat -nr' | grep 192.168.244.208

Returns:

192.168.244.208 0.0.0.0 255.255.255.255 UH 0 0 0 calif77d21d9720

This diagram showing the node running the target pod should help to clarify where this interface sits:

HookPoint

Finally, use the interface name and run tc on the node like this. The egrep is only being used to remove an unwanted debugging message:

kubectl exec -n calico-system calico-node-cwh85 -- tc -s qdisc show dev calif77d21d9720 2>&1 | egrep -v 'Defaulted'

The resulting output shows that 104 packets have been dropped:

qdisc noqueue 0: root refcnt 2 
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0
qdisc clsact ffff: parent ffff:fff1 
Sent 122748 bytes 1342 pkt (dropped 104, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0

Sure enough, repeating the command after a curl from the allowed client does not cause an increment. However, an attempt from any other client doe increment the counter:

qdisc noqueue 0: root refcnt 2 
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0
qdisc clsact ffff: parent ffff:fff1 
Sent 125582 bytes 1391 pkt (dropped 113, overlimits 0 requeues 0) 
backlog 0b 0p requeues 0

Finally, let’s see the output of the eBPF program debug logs. First, recall that, as noted earlier, enabling logs in this way has a significant impact on eBPF program performance.

I’ll do it on my cluster, so you don’t have to. First, let’s examine the existing bpfLogLevel:

calicoctl get FelixConfiguration default -o yaml

This results in:

apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2021-07-07T10:56:31Z"
name: default
resourceVersion: "211197"
uid: 073ffcbb-2e57-46cc-ac85-7a04a1310f27
spec:
bpfKubeProxyIptablesCleanupEnabled: true
bpfLogLevel: ""
logSeverityScreen: Info
reportingInterval: 0s
vxlanEnabled: true

Now let’s patch the resource:

calicoctl patch FelixConfiguration default -p '{"spec":{"bpfLogLevel": "Debug"}}'

Returns:

Successfully patched 1 'FelixConfiguration' resource

And, to see the logs, using the IP of the node:

ssh ubuntu@35.225.233.46 'sudo tc exec bpf debug'

As was noted previously, these logs are extremely verbose:

<idle>-0 [000] ..s. 448611.982865: 0: ens4-----E: ip->ttl 64
<idle>-0 [000] ..s. 448611.982865: 0: ens4-----E: Final result=ALLOW (0). Program execution time: 8545ns
<idle>-0 [000] ..s. 448611.984079: 0: ens4-----I: New packet at ifindex=2; mark=0
<idle>-0 [000] ..s. 448611.984080: 0: ens4-----I: IP id=43073 s=3306ba9f d=a800076
<idle>-0 [000] ..s. 448611.984081: 0: ens4-----I: IP id=43073 s=3306ba9f d=a800076
<idle>-0 [000] ..s. 448611.984082: 0: ens4-----I: TCP; ports: s=56016 d=22
<idle>-0 [000] ..s. 448611.984082: 0: ens4-----I: IP id=43073 s=3306ba9f d=a800076
<idle>-0 [000] ..s. 448611.984083: 0: ens4-----I: CT-6 lookup from 3306ba9f:56016
<idle>-0 [000] ..s. 448611.984084: 0: ens4-----I: CT-6 lookup to a800076:22
<idle>-0 [000] ..s. 448611.984084: 0: ens4-----I: CT-6 Hit! NORMAL entry.
<idle>-0 [000] ..s. 448611.984085: 0: ens4-----I: CT-6 result: 2
<idle>-0 [000] ..s. 448611.984085: 0: ens4-----I: conntrack entry flags 0x0
<idle>-0 [000] ..s. 448611.984086: 0: ens4-----I: CT Hit
<idle>-0 [000] ..s. 448611.984086: 0: ens4-----I: IP id=43073 s=3306ba9f d=a800076
<idle>-0 [000] ..s. 448611.984087: 0: ens4-----I: Entering calico_tc_skb_accepted
<idle>-0 [000] ..s. 448611.984087: 0: ens4-----I: src=3306ba9f dst=a800076

You can review the previous “Available Tools” section for a reminder how to read these logs.

Now let’s patch the resource back, to turn off this level of verbose debugging:

calicoctl patch FelixConfiguration default -p '{"spec":{"bpfLogLevel": ""}}'

Returns:

Successfully patched 1 'FelixConfiguration' resource

Conclusion

There’s a lot to unpack here, and if you want to know even more you probably saw a thread to pull on. Remember that you can always join us on the Calico Users Slack channel – you’ll find other knowledgeable users and contributors there.

Hopefully this deep-dive met its aims of:

  • encouraging familiarity and understanding about the Calico eBPF data plane.
  • teaching techniques for basic visualisation/learning/troubleshooting of the Calico eBPF data plane.

If you want to read more about Calico’s eBPF data plane, you could start with these links:
https://docs.projectcalico.org/about/about-ebpf
https://docs.projectcalico.org/maintenance/ebpf/enabling-bpf
https://docs.projectcalico.org/maintenance/ebpf/

Thanks/Attributions

James Harris, for https://github.com/jmalloc/echo-server
Jan Engelhardt, for https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg

Join our mailing list

Get updates on blog posts, new releases and more!