An important part of any Kubernetes cluster is the underlying containers. Containers are the workloads that your business relies on, what your customers engage with, and what shapes your networking infrastructure. Long story short, containers are arguably the soul of any containerized environment.
One of the most popular open-source container orchestration systems, Kubernetes, has a modular architecture. On its own, Kubernetes is a sophisticated orchestrator that helps you manage multiple projects in order to deliver highly available, scalable, and automated deployment solutions. But to do so, it relies on having a suite of underlying container orchestration tools.
This blog post focuses on containers and container networking. Throughout this post, you will find information on what a container is, how you can create one, what a namespace means, and what the mechanisms are that allow Kubernetes to limit resources for a container.
A container is an isolated environment used to run an application. By utilizing the power of
filesystem from the Linux kernel, containers can be allocated with a limited amount of resources and filesystems inside isolated environments.
In Linux, processes can be isolated using namespaces. A process inside a namespace is unaware of applications that are running in other namespaces or in the host environment.
Let’s open up a Linux terminal and create a namespace.
permission deniederror message, try the command with
We can list the current network namespaces by issuing the following command.
ip netns list
Next, let’s create a new network namespace.
ip netns add my_ns
Now that we have a network namespace, we can assign processes to run in that isolated environment.
ip netns exec my_ns bash
If you need to verify your current namespace, use the following command.
If you would like to take this further, I highly recommend that you open a new terminal and issue commands such as
ip link in both windows to get a better understanding of the host and container environment.
exit command to get out of the namespace.
Control groups (cgroups)
Control groups, or cgroups, allow Linux to allocate a limited amount of resources, such as memory, network bandwidth, and CPU time, to users.
/sys/fs/ directory (in most Linux distros) there is a folder called
cgroup that holds proposed resource limitations. You can find these limitations by using the following command:
Back in our host machine, let’s create a
cgroup folder architecture for our
mkdir -p /sys/fs/cgroup/pids/my_ns_cgroup
After creating a
cgroup folder, Linux automatically populates the necessary files by following the cgroup-v1 standard. We can examine these files by using the following command.
Let’s add a limit to the amount of processes that can be running in our namespace.
echo 2 | sudo tee /sys/fs/cgroup/pids/my_ns_cgroup/pids.max
A filesystem (FS) is the method and data structure used to store and retrieve information in an operating system. Because of its influence, it is impossible to discuss the container filesystem without talking about Docker, since they proposed an archive based architecture that is now embedded in every container runtime environment. This layer-based architecture for container images could be arguably one of the most important ones.
A layered container image is a tar archive with a JSON-based manifest that preserves changes to the stored data files by assigning each action with a layer name. In a running container, these layers merge to form a read and writable new layer reflecting the most recent changes.
This is possible by using the
Overlay File System feature of Linux that results in an efficient way to quickly spawn new containers without wasting storage resources.
Now that we know the basics, let’s explore how Docker uses these technologies to run a container.
First let’s checkout what driver is in use by executing the following command:
docker info | egrep -i storage
Overlay2 is an
Overlay filesystem implementation and the default option in most Linux based distros.
Overlay uses the UnionFS to merge two directories into a top layer; these directories are referred to as upper and lower directories.
It is important to note that lower directories are not modified in any way and it is possible for an upper directory to have multiple lower directories.
Execute the following command to pull the base Ubuntu image.
docker pull ubuntu:20.04
After the pull is finished, run the following command to determine the
IMAGE ID hash:
docker image ls --no-trunc
IMAGE ID hash from the previous output and run the following command to peek into the content of the manifest file.
Now that we know how to look for the manifests, let’s run a Ubuntu container and check where the data layers are stored.
Execute the following command to run a Ubuntu container:
docker run --rm -it ubuntu:20.04 /bin/bash
Now execute this command to see the mounted overlays:
cat /proc/mounts | egrep overlay
You should be able to see the location of
upperdir in the output result.
If you would like to experiment further, try examining the content of directories returned by the previous command in the host terminal.
The namespace that we created earlier (
my_ns) is isolated, meaning there is only a loopback network interface. Let’s verify this.
From the host machine, let’s look at the interfaces in the namespace by executing the following command.
ip netns exec my_ns ip link
You should see an output similar to:
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Just like a computer, a namespace requires network interfaces to communicate with other resources in a network realm. There are a variety of virtual interfaces available in Linux that can offer different functionalities depending on your desired use case. Let’s use
veth to add a virtual interface into our namespace.
ip link add ns_side netns my_ns type veth peer name host_side
We can verify our new virtual interface by running the
ip link command inside the namespace.
ip netns exec my_ns ip link
You should see an output similar to:
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ns_side@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 56:48:4f:6f:4b:00 brd ff:ff:ff:ff:ff:ff link-netnsid 0
ip link inside the host should output a similar result.
5: host_side@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e2:8f:30:87:9f:25 brd ff:ff:ff:ff:ff:ff link-netns my_ns
If you look more closely at the previous terminal, interfaces
host_side have a DOWN status. This means both interfaces are
disabled at the moment.
Issue the following commands to enable the logical interface for both sides.
ip link set host_side up ip netns exec my_ns ip link set ns_side up
Add an IP address to the host.
ip address add 192.168.254.1/32 dev host_side
Add an IP address to the namespace side.
ip netns exec my_ns ip address add 192.168.254.2/32 dev ns_side
Both host and namespace have an IP address, but cannot ping each other since these two are not in the same broadcast domain. We need routes in order to establish communication between the host and the namespace.
Use the following command to add a route for the host side.
ip route add 192.168.254.2 dev host_side
Use the following command to add a route for the namespace side.
ip netns exec my_ns ip route add 192.168.254.1 dev ns_side
As we previously established, containers are a combination of
filesystem. Kubernetes uses container runtime applications, such as Docker, CRI-O, and containerd, to manage containers in a cluster. A container runtime is just a fancy way to automate the previous steps and create and maintain namespaces automatically.
Manually creating a container is a fun experience, but it is an impossible task if you want to apply it to a large scale datacenter. Container networking environments are software defined networking (SDN) realms where SDN is used to establish communication with other resources within the datacenter. In a datacenter where a massive amount of containers is created and removed every second, SDN can provide the efficiency to process packets inside and outside of the container. In the context of Kubernetes, this is usually where the container networking interface (CNI) shines.
CNI is a CNCF project that provides a set of specifications and libraries to achieve networking in any environment. Anyone can use these specifications to write a plugin for any project and offer essential or advanced networking features.
Kubernetes complies with CNI specification standards to provide network connectivity for clusters. According to Kubernetes documentation, there are four areas that a CNI needs to address.
- Communication between containers
- Communication between pods
- Communication from a pod to a service
- Communication from an external resource to a local service
Calico Open Source is a networking and network security solution for containers, virtual machines, and native host-based workloads. It supports multiple data planes, including a pure Linux eBPF data plane, a standard Linux networking data plane, and a Windows HNS data plane. Calico provides a full networking stack, but can also be used in conjunction with cloud provider CNIs to provide network policy enforcement.
In this post, we walked through how to create a network namespace that runs “bash” binary, and hooked it to the host machine networking interface to establish the network communication between the two. Kubernetes provides a lot more features to orchestrate your containerized environment that can be daunting to replicate manually, but the point of creating one container is to show how CNI allows a container to establish connection with other resources inside an environment.
Check out our container security guide to learn how to secure Docker, Kubernetes, and all major elements of the modern container stack.
Join our mailing list
Get updates on blog posts, new releases and more!