Why BGP?

… and why not an IGP (OSPF or IS–IS) is a question we occasionally get asked when presenting on Project Calico. The person asking the question usually thinks this is a single question, but it is actually two related, but disparate questions. To understand why, and to understand the Calico answer to both of those questions, we need to look a bit at how BGP and IGPs are used in large–scale networks today.

Where (and why) are certain protocols used in a large network

Any network, but especially a large network has two different routing issues it needs to deal with:

1) Discovering the inter–router topology of the network

2) Discovering the working end points in the network or externally connected to the network.

It might help if we included a visual aid

Diagram illustrating a network with multiple nodes, highlighting inter-router topology and external endpoint connections

A couple of notes about the diagram. We are mainly interested in “The Network”. The infrastructure links in our network are colored Red, as are the routers. The end points in our network are colored orange as are the links between the end points and the routers. End points are not capable of running a routing protocol, while routers are.

Our network is connected to “Another Network” via a network–network link colored magenta. The router in the other network that connects to our network is colored blue. That network also has end points, which are cyan in the diagram. Since that network is not ours, we really can’t see what is going on inside it. All we can really tell is that it is telling us about its end points, and that we are connected to one of its routers.

This brings up the concept of an Autonomous System (AS). Our network is an AS as is the other network. While an in–depth conversation about ASs is outside of the scope of this post, it is useful to think of them as administrative boundaries. You know a lot more about what is happening in your AS than in another AS, and that is what we are representing here by the dashed lines in the other network — we don’t know how those end points are connected to the router that we know about (our peer), and really, we shouldn’t have to care.

Infrastructure routing within an AS

In a network made up of an incomplete graph of routers,¹ the routers need to know the topology of the network, so that they can send traffic across the network of routers to reach the intended destination. This is where an IGP, such as IS–IS or OSPF is used.

IGPs perform quite a lot of complex calculations to derive a shared view of the network topology at any point in time. Doing so limits the scale of a network in which IGPs can operate. A single view in an IGP should only include tens (in some extreme cases, low hundreds) of routers to meet those scale and performance requirements. There are techniques that allow that scale limit to be extended, such as using areas in OSPF or levels in IS–IS (both of which can be thought of as a single view of part of the network. However, these techniques also impose architectural limitations, the discussion of which is beyond the scope of this post.

IGPs also are limited in the number of end points that they can feasibly advertise. This number is a bit more flexible, but usually tops out at thousands or low tens of thousands of end points. Many large network operators get nervous when the number of routes in an IGP extends beyond five or six thousand.

In some cases, networks with directly connected routers, as in the Calico Ethernet fabric option, an IGP isn’t even necessary, as the infrastructure is a logically complete graph. Even if the graph is incomplete, if it is simple an IGP may not be necessary.

Foreign end point advertisement

I use the term foreign fairly loosely here. A foreign end point is an end point that is not, in itself, a router within the AS boundaries. This could be because it isn’t a router (a host, for example) or it is a router in a foreign AS. In this case, I don’t really need to know how to get to the end point (the router may not even know, as it can’t communicate via a routing protocol), the router just needs to know where it is. Once the router knows where it is, it can use infrastructure routing to send the packet on its way.

The number of end points in this application can be orders of magnitude greater than in the infrastructure case above. The public Internet, as of the end of January 2015 advertises over 526,000 routes,² and some networks can have over one million end points in them. This is definitely the range that you would see in a routed scale–out network fabric.

To meet this requirement (among others) BGP was developed. It can scale up to hundreds of routers in a network, and tens of thousands if a technology called BGP route reflection is used. It can also advertise millions of routes and manage that via a very flexible policy infrastructure, if needed.

Some networks do conflate these two routing functions by including their foreign end points in their infrastructure (IGP) routes. While this does present issues at scale, it works fine for many smaller, simpler networks. Basically the network is stating that it’s non–router endpoints are part of the infrastructure. This is not a problem if the scale of the end points is small and/or they are aggregateable, neither of which usually applies to a scale–out fabric.

But OSPF is easy, and BGP is hard and complex

Actually, that’s not quite correct. If anything, because BGP is not really interested in the full topology, in its most simple application, BGP is simpler to configure, run, and troubleshoot than OSPF or IS–IS. The protocol itself is pretty simple. However, in large transit³ networks, and very complex terminal networks,⁴ the operator of the network uses BGP policy to provide security and apply business logic to the traffic that is flowing through their network. Because that policy can be quite complex, the management of BGP policy can be quite complex as well. If the policy is simple, or if there is no policy, BGP configuration is usually just telling a BGP router who its other peers are.

So why did Calico select BGP over an IGP?

Let’s address this question as two separate questions.

Why does Calico use BGP for end point advertisement?

One of the design briefs for Project Calico was to use the tools and techniques of the public Internet for scale–out network fabrics. The primary reason for this is that the industry has decades of experience in running truly large networks, and the tools and techniques have been honed over that time, it would be somewhat silly to just toss all of that work in the rubbish bin, and go through often painful learning events again. As the scale–out world approaches Internet scale end point networks, we should, therefore, be using the same tools. A VM based cloud today could easily host thousands to tens of thousands of compute servers in a pod, and tens of thousands or even low hundreds of thousands of VMs (end points in Calico terminology) in that same pod. A container based cloud might increase the end point count by an order of magnitude or two.

In the Calico design, this would equate to tens of thousands of routers, and potentially millions of routes or end points. These numbers are not consistent with using an IGP, but do fit in the envelope for BGP, especially when we use route reflection to improve the router scaling number.⁵

In short, BGP is the only viable option for this component of routing in a Calico network.

How about infrastructure routing?

This is a case where we haven’t ruled against IGPs, its just that, in many cases, an infrastructure routing protocol is not required in a Calico network. In the Ethernet fabric Calico interconnect model, each of the compute server vRouters are directly visible to one another, and therefore an IGP just is not necessary.

In the case where an IP fabric⁶ is used to interconnect the compute servers, it may be necessary for the Calico compute server vRouters to participate in some form of IGP, if the interconnect fabric design is especially complex.

It should be noted that, in the case of an IP fabric interconnect, those intermediate routers/switches must also be members of the BGP routing topology.

Summary

To summarize, we use BGP to advertise the end points in a Calico network because it is:

Simple
Industry current best practice
The only protocol that will sufficiently scale

The use of an IGP for infrastructure routing is dependent on the interconnect fabric that is chosen for your Calico installation.

Hopefully this answers the “Why BGP?” question(s).

As always, we would love your questions or feedback.

By incomplete graph I am referring to a network where every router is not interconnected to every other router. In such a network it may be (most probably is) necessary for two routers to use intermediate routers to communicate with one another. For example, if A wants to send a packet to Z, it may have to send it through B, C, and X first. ↩
In the case of the Internet (or other external networks), a route is not an individual end point, but an aggregation of end points that share the same path. ↩
A transit network is a network that carries traffic that it neither originated, nor that it will terminate. Level 3 and AT&T are examples of transit networks. ↩
A terminal network is one that does not transit traffic, but either originates it or terminates it. Most networks on the Internet are terminal networks. However, some terminal networks are as complex as even the largest transit networks, and have similarly complex routing policies. ↩
BGP route reflection is a key part of the Calico architecture. ↩
By an IP fabric, I mean the use of L3 routers or switches to interconnect the Calico compute servers. There will be a post on this architecture in a few days, and this document will be updated to point at that document. ↩