Most public cloud projects start off as a fairly well scoped project, a single cluster with limited or no connectivity requirements back to existing corporate resources or data centers. As such, they also have limited connectivity requirements external to the cluster.
However, those initial projects soon grow into more general production platforms, with requirements such as resiliency (multiple clusters), connectivity to legacy data centers, etc.
While there are multiple ways of providing the necessary connectivity, there is one way that leverages the same simple, scalable, networking constructs that we use here at Tigera.
Let’s look at that option, and see how it might help with the real-world networking requirements of scalable public cloud or hybrid cloud environments.
Hybrid Cloud Connectivity
It’s important to remember that different users have different requirements. To address a foundational requirement of establishing IP connectivity between clusters, a user can implement either a VPN connection into a cluster being hosted in public cloud or use a direct connection (e.g. AWS Direct Connect).
When using a VPN, the connection is going over an unsecured network and uses a company’s Internet connection(s) to establish connectivity. This is typically a VPN device such as an edge router, virtual router or even a Linux host running VPN software. On the public cloud side, a similar virtual device can be used such as a proprietary virtual router, managed service VPN (i.e. virtual private gateway) or even a self-built instance running VPN software.
When using a direct connection, a customer can simply choose to extend their data center connectivity using a standard 50 Mbps to 10 Gbps Ethernet connection from the customer’s facility(s) to a cloud provider POP. From there the traffic is forwarded directly to the customer’s VPC(s). In the case of AWS, this requires a virtual private gateway and a virtual interface on the AWS side. Also, the main way to advertise routes across the two environments is via BGP.
The rest of this blog uses AWS as an example — however it also applies to other public cloud providers that offer direct interconnection services.
A transit routing function is done within a single VPC (called a transit VPC) to connect the private data center to every VPC that’s deployed in AWS. This resembles a hub and spoke topology in AWS and the routing is handled by a virtual router or a set of virtual routers in the transit VPC. BGP is used between the private data center and the AWS infrastructure to advertise routes.
There are several transit routing implementations available:
- Proprietary, commercial virtual routers
- Free and/or Open Source virtual routers
- An Open Source router on Linux using Calico with policy
The Calico-based router with policy option is best for cases where more control and flexibility is desired and/or open source technology is specifically desired.
Using Calico To Connect Clusters in Different VPCs
Calico with policy can be used on Linux using Open Source router software. Once this option is selected, Kubernetes clusters in different locations can use Calico to start exchanging routes using BGP.
Important AWS components to consider when designing such a topology include:
- Connection to AWS (discussed above).
- Virtual private gateways and virtual interfaces in case of Direct Connect.
- Direct Connect requires the use of a private virtual interface and at least 1 802.1Q (VLAN) tag ID to connect to the AWS infrastructure.
- Routing Tables — a routing table represents the implied router for each VPC segment. At least one is required in a VPC but more can be used.
- Subnet — For the router in the transit VPC, a unique subnet is used per interfaces.
- VPC BGP Peering Connection — This needs to be configured for each hub VPC to connect to the transit VPC.
Other considerations to keep in mind include:
- Routing table sizing and performance.
- There should be enough memory allocated to the instance running the routing function. The Linux kernel is already proven to forward IP packets optimally locally but the number of interfaces and spoke VPCs should be accounted for.
- Availability of routers in the transit VPC.
- When peering with multiple spoke VPCs, there should be one or preferably two routers in the transit VPC. This allows for multiple paths from the private data center to the spoke VPCs.
The Calico-based router with policy option is best for cases where more control and flexibility is desired and/or open source technology is specifically desired. Benefits of this approach include using BGP configurations for custom routing behavior within AWS and router protection (using host endpoint protection). This is done by installing the Calico components in an instance running a supported OS like Ubuntu or CentOS.
Routing with Linux
To turn a Linux VM into a router, we just have to load a routing protocol daemon on it. Project Calico already contains a routing daemon, so that is what we use here. It exchanges routes between peers, and installs the routes in the Linux kernel as necessary to enable a full function transit router behavior in the Linux host.
Under the hood, Calico makes use of the BIRD internet routing daemon, which uses a configuration file to configure BGP parameters, which is found under /etc/bird/bird.conf. In our topology, the BGP configuration would, at a minimum, specify the following parameters:
- Router’s local ASN
- Each neighbor and their ASN (2 in our topology)
- Any export filters
- Source address
For large configurations and multiple instance deployment, it’s recommended to use BIRD templates to group common options together. This is a common technique used by many users today that are customizing BGP configurations for their environments.
Router Protection with Calico Host Endpoint Protection
Host endpoint protection is a great feature for locking down and securing nodes that are not necessarily hosting workloads, like bare metal server or even routers. Here is an excellent blog post explaining this in greater detail.
The way we would utilize Calico host endpoint protection for a router would be to further secure the management plane and data plane flowing through the router. For the management plane, we can centrally define what protocol is allowed and from where.
For the data plane, we can define:
- Which router interfaces are allowed to route to which others (i.e. eth0 to eth1).
- Which specific workload traffic can be accepted into the router by using labels. These labels can match specific workloads, host names or application names being hosted in AWS. Note that policy for the transit router can be defined using an existing Calico install that may be in the private data center side or on AWS itself.
Application connectivity is important to plan for when establishing a hybrid cloud strategy. There are several options available today but most lack the flexibility and policy enforcement that modern applications require. Using Calico as a transit router begins to address some of these requirements and opens up a slew of excellent possibilities. Stay tuned for more around this emerging pattern using Kubernetes and Calico for public cloud connectivity.