High-availability connectivity for Kubernetes with dual ToR

Dual ToR (top of rack) peering provides a redundant path for customers with cluster applications that cannot tolerate service downtime or failure and require a high-availability solution. While Calico ToR connectivity has existed for some time, Calico Enterprise now supports connectivity with dual ToR switches. From an operational standpoint, a cluster that is peered to two ToR switches will still have an active link even if one switch becomes unavailable, thus ensuring the cluster always has a network connection. Because of the two ToR switches per rack, the whole setup is often referred to as “dual ToR.”

More specifically, Calico:

  • Enables cluster operators to connect with, and take advantage of, dual ToR switches
  • Provides two active, independent planes of connectivity between cluster nodes when a dual plane cluster is connected via dual ToR switches
  • Automates the process of bootstrapping and configuring BGP peering between cluster nodes and ToR switches before Kubernetes networking is started and the Calico BGP daemon (BIRD) takes over
Calico’s HA Kubernetes solution architecture

Each rack has 2 independent ToR/switches, and each node in the rack is connected (over independent Ethernets) to both of those switches.

Benefits of dual ToR switch connectivity

For cluster operators using BGP who need reliable, consistent connectivity to resources outside of the cluster as well as cluster nodes on different racks, Calico Enterprise dual ToR peering offers several benefits.

  • Ensures high availability with active-active redundant connectivity planes between cluster nodes and ToR switches. Kubernetes cannot do this on its own.
  • Prevents service downtime so that, if a link or software component breaks somewhere in one of the planes, cluster nodes can still communicate over the other plane, and the cluster as a whole continues to operate normally
  • Eliminates a complex, time-consuming manual process, and reduces operational overhead by automating the bootstrapping and configuration of BGP peering between cluster nodes and ToR switches

Deploying Calico’s high-availability solution

We have introduced some automation into the process and greatly simplified the deployment. For a dual ToR cluster to operate seamlessly when there is a break on one of the planes, there are some prerequisites. To ensure rapid outage detection, a dual ToR cluster needs Calico BGP peer resources to specify how each node should peer with its ToR switches. The remaining parts of the dual ToR network design are implemented as properties of those BGP peerings, and as corresponding properties on the BGP configuration between and within the ToRs and core infrastructure.

 

Looking for more information? Deploy a Dual ToR Cluster provides detailed instructions on how to deploy a dual plane cluster to provide redundant connectivity between your workloads for on-premises deployments.

Join our mailing list

Get updates on blog posts, new releases and more!