In Calico v3.9, the Calico team introduced the capability to live migrate clusters using flannel networking to use Calico, without application downtime. In this blog, I’d like to talk about if this live migration is right for you, and how it works.
In a Kubernetes cluster, flannel provides a number of different networking backends for getting traffic from one node to another. The most common backend uses VXLAN encapsulation for pod traffic between different cluster nodes. Flannel itself offers basic networking features but lacks a number of capabilities provided by Calico – like flexible IP address management and network policy.
Calico does, however, provide a VXLAN data plane for pods. So you can still use VXLAN networking as well as the extra capabilities that Calico offers, all in one package.
There are a few different reasons why you might want to consider migrating your clusters from flannel to Calico. Let’s walk through a couple of them.
Network policy is a key part of building security into your Kubernetes deployments, and something that flannel does not provide natively (Calico does).
You could run Calico in policy-only mode on top of flannel networking (also known as “canal”), but there are a few downsides to this.
Firstly, this is a choice you need to make at cluster creation time, which means if you didn’t create your cluster in this way to begin with, you will need to recreate the entire cluster – this time with Calico installed as well.
Secondly, running both flannel and Calico introduces an extra moving part. You likely want to simplify your configuration by running only Calico.
Finally, while canal provides some of Calico’s features on top of flannel, due to a fundamental difference in assumptions each project makes about cluster networking, it can’t enable the full Calico feature set. For example, its missing Calico’s IP address management features discussed below.
Flannel’s networking implementation is strongly rooted in the use of host-local IPAM (IP address management) CNI plugin. This is a simple approach to managing how IP addresses are allocated in your cluster, but comes with a few limitations:
Migrating to Calico enables you to leverage Calico’s flexible IP address management, which solves these use-cases and more. For more information, see how to get started with IPAM.
There are two ways to switch your cluster to use Calico:
If you don’t care about downtime, or if you have the ability to migrate workloads from one cluster to another without downtime, then we recommend simply creating a new cluster using Calico and migrating your workloads. This is the easiest way to get started using Calico.
However, if you cannot move your production workloads to a new cluster, performing a live migration from flannel to Calico is for most users as simple as applying a new Kubernetes manifest.
The live migration is simple to use, but behind the scenes there are a number of things going on, all orchestrated by a purpose-built migration controller. Let’s talk about what’s going on in a bit more detail.
The migration controller has three main stages:
The first thing the controller does is check the cluster configuration to make sure a migration is possible. It looks to make sure flannel is configured properly and in a way that is compatible with Calico migration. For example, it asserts that the flannel VXLAN backend is in use.
This is where most of the action occurs. Once cleared for takeoff, the controller uses node labels to perform a controlled rolling update of each node in the cluster. For each node in the cluster, the migration controller will do the following:
Drain pods from the node and prevent scheduling of new pods.
Removes flannel and its configuration from the node.
Installs Calico on the node.
Re-enables pod scheduling to the node.
The migration controller makes sure to configure Calico so that nodes running Calico and nodes still running flannel will continue to work together, leaving the cluster fully operational throughout the migration.
Once complete, your cluster will be using solely Calico for networking. Calico’s IPAM will now manage pod IP allocations, and the full set of Calico features will be available to use.
At this stage, all the nodes are successfully running Calico, and flannel is no longer needed. The controller removes the flannel DaemonSet (which no longer controls any pods) and exits.
If you enjoyed this blog then you may also like:
Get updates on blog posts, new releases and more!