The Linux kernel is useful for implementing networking, observability, and security features, but it can also present difficulties. Whether adding modules or modifying kernel source code, developers have typically found they need to deal with abstracted layers and intricate infrastructure that are hard to debug. Extended Berkeley Packet Filter (eBPF) addresses both these issues.
eBPF is a kernel technology (fully available since Linux 4.4). It lets programs run without needing to add additional modules or modify the kernel source code. You can conceive of it as a lightweight, sandboxed virtual machine (VM) within the Linux kernel. It allows programmers to run Berkeley Packet Filter (BPF) bytecode that makes use of certain kernel resources.
Utilizing eBPF removes the necessity to modify the kernel source code and improves the capacity of software to make use of existing layers. Consequently, this technology can fundamentally change how services such as observability, security, and networking are delivered.
Here are some of the important use cases for eBPF.
Extending the basic capabilities of seeing and interpreting all system calls and providing packet and socket-level views of all networking operations enables the development of revolutionary approaches to system security.
Typically, entirely independent systems have handled different aspects of system call filtering, process context tracing, and network-level filtering. On the other hand, eBPF facilitates the combination of control and visibility over all aspects. This allows you to develop security systems that operate with more context and an improved level of control.
The combination of efficiency and programmability makes eBPF a good candidate for all networking solutions’ packet processing requirements. The programmability of eBPF provides a means of adding additional protocol parsers, and smoothly programs any forwarding logic to address changing requirements without ever exiting the Linux kernel’s packet processing context. The effectiveness offered by the JIT compiler offers execution performance near that of natively compiled in-kernel code.
The ability to attach eBPF programs to trace points in addition to kernel and user application probe points enables visibility into the runtime behavior of applications as well as the system.
By providing introspection capabilities to both the system and the application side, both views can be combined. This gives unique and powerful insights to troubleshoot system performance issues. Advanced statistical data structures let you extract useful visibility data in an effective way, without needing the export of huge amounts of sampling data that is typical for similar systems.
Rather than relying on gauges and static counters exposed by the operating system, eBPF allows for the generation of visibility events and the collection and in-kernel aggregation of custom metrics based on a broad range of potential sources.
This increases the depth of visibility that might be attained and decreases the overall system overhead dramatically. This is achieved by collecting only the required visibility data and by producing histograms and similar data structures at the source of the event, rather than depending on the export of samples.
eBPF programs are used to access hardware and services from the Linux kernel area. These programs are used for debugging, tracing, firewalls, networking, and more.
Developed out of a need for improved Linux tracing tools, eBPF was influenced by dtrace, a dynamic tracing tool available mainly for BSD and Solaris operating systems. Unlike dtrace, Linux was not able to achieve a global overview of running systems. Rather, it was restricted to specific frameworks for library calls, functions, and system calls.
eBPF is an extension of its precursor, BPF. BPF is a tool used for writing packer-filtering code via an in-kernel VM. A group of engineers started to build on the BPF backend to offer a similar series of features as dtrace, which eventually evolved into eBPF. Although initially released in limited capacity in 2014 with Linux 3.18, you need at least Linux 4.4 or above to make full use of eBPF.
The diagram below is a simplified illustration of eBPF architecture. Prior to being loaded into the kernel, the eBPF program needs to pass a particular series of requirements. Verification includes executing the eBPF program in the virtual machine.
This permits the verifier, with 10,000+ lines of code, to carry out a set of checks. The verifier will go over the potential paths the eBPF program might take when executed in the kernel, to ensure the program runs to completion without any looping, which would result in a kernel lockup.
Additional checks—from program size, to valid register state, to out-of-bound jumps—should also be made. eBPF distinguishes itself from Linux Loadable Kernel Modules (LKM) by adding these additional safety controls.
If all checks are cleared, the eBPF program is loaded and compiled into the kernel at a location in a code path, and waits for the appropriate signal. When the signal is received in the form of an event, the eBPF program is loaded in the code path. Once initiated, the bytecode collects and executes information according to its instructions.
To summarize, the role of eBPF is to allow programmers to execute custom bytecode safely within the Linux kernel, without adding to or changing the kernel source code. Though it cannot replace LKMs altogether, eBPF programs introduce custom code that relates to protected hardware resources, with limited threat to the kernel.
In many cases, you might use eBPF indirectly through a project like bpftrace or Cilium. These projects offer abstractions on top of eBPF, so you don’t have to write the program directly. You can specify definitions based on intent, which eBPF then implements.
If there isn’t a higher level of abstraction that exists, you need to write the programs directly. The Linux kernel requires that you load eBPF programs in bytecode form. While it is technically possible to directly write in bytecode, this is not a popular option. Instead, developers usually prefer to compile pseudo-C code into eBPF bytecode using a compiler suite, such as LLVM.
The architecture of an extended Berkeley Packet Filter includes the following elements.
eBPF programs run according to events that trigger them. An application (or the kernel) passes a threshold known as a hook point. Hooks are predefined and can include events such as network events, system calls, function entry and exit, and kernel tracepoints. If there is no predefined hook for a certain requirement, you can create a user or kernel probe (uprobe or kprobe).
Once a hook is identified, the BPF system call can be used to load the corresponding eBPF program into the Linux kernel. This usually involves using an eBPF library. When a program is loaded into the kernel, it has to be verified to ensure it is safe to run.
Validation takes into account conditions such as:
An eBPF program must be able to store its state and share collected data. eBPF maps can help programs retrieve and store information according to a range of data structures. Users can access eBPF maps via system calls, from both eBPF programs and applications.
Map types include hash tables or arrays, ring buffer, stack trace, least-recently used, longest prefix match, and more.
An eBPF program cannot arbitrarily call into a kernel function. This is because eBPF programs need to maintain compatibility and avoid being bound to specific versions of the kernel. Thus, eBPF programs use helper functions to make function calls. Helper functions are APIs provided by the kernel, and can be easily adjusted.
Helper calls allow programs to generate random numbers, receive current time and date, access eBPF maps, manipulate forwarding logic and network packets, and more.
These calls make eBPF programs composable. Function calls enable functions to be defined and called in a program. Tail calls enable the execution of other eBPF programs. They can also change the execution context.
eBPF eXpress Data Path (XDP) allows for high-speed packet processing in the BPF application. To ensure a quicker response to network functions, XDP readily launches a BPF program, typically as soon as a packet is obtained from the network interface.
Learn more in our detailed guide to eBPF XDP
BPF Compiler Collection (BCC) is a toolkit used to create effective manipulation and kernel tracing programs. It features various useful examples and tools. It requires Linux 4.1 or above.
eBPF BCC lets you attach eBPF programs to kprobes. This permits user-defined instrumentation on a functioning kernel image that can never hang or crash, and thus will not adversely affect the kernel.
BCC makes BPF programs simple to write with kernel instrumentation in C (and features a C wrapper around LLVM), including front-ends in Lua and Python. It can be used, for example, for network traffic and performance analysis.
Calico Open Source is a networking and security solution for containers, virtual machines, and native host-based workloads that supports a broad range of platforms including Kubernetes, OpenShift, Docker EE, OpenStack, and bare metal services. It has grown to be the most widely adopted solution for container networking and security, powering 1.5M+ nodes daily across 166 countries.
Calico offers support for multiple data planes, including standard Linux, Windows HNS, and Linux eBPF. Compared to the standard Linux networking data plane, Calico’s eBPF data plane scales to higher throughput, uses less CPU per GBit, and has native support for Kubernetes services (without needing kube-proxy).
The data plane’s native support for Kubernetes services achieves the following:
With Calico, you can easily load and unload the eBPF data plane to suit your needs. Calico offers you the ability to leverage eBPF as needed, as an additional control to build your Kubernetes cluster security.
Beyond the three data planes Calico currently supports, there are plans to add support for even more data planes in the near future, including Vector Packet Processing (VPP). Calico lets the user decide what works best for what they need to do.