What’s New in Tigera Calico: An Update on Recent Features & Enhancements

Free and Open Source, Tigera Calico delivers secure networking for the cloud-native era, and 2019 has seen many major enhancements to the most deployed networking and network security solution for Kubernetes.

From version 3.3 released in November of 2018 all the way through to 3.8 released earlier this July, Calico has advanced significantly with features that our community has requested and needed, such as:

IP address management (IPAM) features that make it more configurable and with support to assign a given IP pool to one or more Kubernetes namespaces
Features that give more control and allow much finer-grained dynamic IP management vs the static allocation of a fixed set of addresses to each node in native Kubernetes
Native support for VXLAN encapsulation
Optimized denial-of-service protection for host endpoints using XDP
Namespaced NetworkSets
And more…

Watch this On-Demand technical webinar to learn more about these new features, with real-world examples of how, and why, you’d want to use them to improve the network security of your Kubernetes environment.

Complete Transcript

Welcome all to the webinar. Looking forward to walking you through the material with everyone today. We’re going to go ahead and just do a brief overview of what Calico is, how it works. I think it should be interesting to folks on the phone, and then we’ll dive right into some of the new features. Without much further ado, let’s jump into it. Calico, for those that familiar with Calico, know that it’s basically become the de facto standard for networking in Kubernetes. Think of it as networking for containers. It’s an open-source project, commercially backed by Tigera. That we know of, it powers about a hundred thousand Kubernetes clusters, a little over it at this point. Definitely happy and proud of the work that we’ve done with that open-source project. You might ask “okay, Tigera, Calico, great, what’s the difference?” First and foremost, Tigera Calico is open-source, it’s obviously scalable. At the biggest scales, we’ve seen it over 5,000 nodes. So, where does Tigera Secure really fit into it? Well, Tigera basically productizes the Calico open-source project. Anything that you would think that you would need in order to roll into production with Calico, that’s essentially what we go ahead and accomplish with Tigera Secure. In terms of visibility, it’s great that I have policies, it’s great that I have YAML, let me go ahead and actually view these in the GUI. Why don’t I also go ahead and also leverage tiering in order to have some sane constructs around managing all these policies. Let’s go ahead and grab those flow logs and dump them into, whether it’s Splunk or Elasticsearch, and actually do some machine learning on it. Various different aspects that we go ahead and attempt to productize, [inaudible 00:03:59] open-source project, and we’re trusted by a myriad of companies out there. For a company that’s only three and a half years old, truly an enviable customer list, here, of folks that trust our tech and then appreciate what we’re doing in terms of productizing open-source Calico. We do have definitely some strategic partnerships, in terms of all the cloud providers, Azure, AWS, Google, IBM, they all essentially allow for Calico to be deployed. Google actually takes it a step further, and not only leverages us for the authorization, but also for the CNIs, which is really cool. Also, happy to be working with at lot of the on-prem customers, or on-prem companies, Red Hat Docker, which essentially also allow for our software to be deployed and fully supported, integrate with those Kubernetes stacks. “What can Calico do for me?” At a basic level, it basically hands out IPs to your pods. They have a distributed system, you have distributed applications running on these different hosts, how do I get an IP to a pod and how do I route between pods? That’s essentially what Calico does. But, we also go ahead, because we’re doing the routing for you, we also go ahead and allow policies to be deployed that basically dictate which pods can talk to which pods. We take it up into the stack, into Layer 7, leveraging its [inaudible 00:05:33] as well, in terms of actually being able to secure HTTP endpoints. So, it’s just a visual representation of what that is. We’ll have the … I guess the text didn’t come across, but essentially the first two rows are common across Calico and Tigera Secure, and then the latter rows, the management UI, the continuous compliance, and so forth, are all Tigera Secure-specific. So let’s go ahead and just do a brief overview in terms of how Calico works. It’s nice to talk about new features, but it’s also nice to lay some groundwork, in terms of how Calico actually works. We’ll spend the next 10 minutes walking through some of this basic functionality. So, let’s go ahead and start with a host. As its basic level, we have a host, and we have a pod on said host. We’re going to go ahead and give that pod an IP, 1.2.3.4. If we’re on the host, and we want to route to that pod, how do we do it? The first thing that needs to get set up is what we could call a V-pair, where the pod itself is going to have an eth0 interface, so we’re going to go ahead and have this paired up with a cali interface on the actual host. Once that basic plumbing is set up, we can create a route on that host, so basically any traffic that needs to go to 1.2.3.4 is going to go ahead and get pushed through the cali interface and into the pod. This is basic routing to the pod from the host. We’re obviously working in a distributed environment, so we’re probably going to have multiple hosts. So, we have Host 1 is 172.0.0.1, we’re going to have Host 2 as 172.0.0.2. We’re going to have a network, we’re not going to be particularly specific about the type of network, whether it’s Layer 2, Layer 3 adjacency, basically a network that allows us to route from one host to the other. If we take a look at a route table on that other host, we’re going to see a 1.2.3.4 entry, and if we want to get to that IP, we’re going to have to go ahead and route over to 172.0.0.1, basically Host 1. We now have a way to route both locally and from other hosts to that Host 1, and specifically to that pod. This is what a distributed route table would look. And, obviously, if we have multiple pods, we would have similar set up for the other pods. So, if we have a second pod on that second host, 5.6.7.8, you’re similarly going to see a route to the cali interface on that Host 2, and then from every other host, we’re going to have to route to Host 2 for that pod IP, and then onto the cali interface. This is what a microcosm, if you will, of what your route tables look across your Kubernetes cluster. So, how do we go ahead and … Let’s go ahead and talk about how we actually make these routes happen. We’re going to start with how the local routes are set up, and that’s where the CNI comes into the picture. The CNI is going to go ahead and setup that V-pair, and create that initial route. The pod’s going to get scheduled onto the host, it’s going to go ahead and call the CNI, the CNI’s going to go ahead and create that V-pair and that route. That’s great, that’s how we would route from the host to the pod. But, again, how do we go ahead and distribute these routes across other hosts? That’s where our friend BIRD comes into the picture. BIRD is an open-source project that does lot of things, but we specifically use it for BGP, in terms of routing BGP, leveraging BGP, divide and distribute the routes across multiple hosts. What that means is once that route goes ahead and gets created on the host, we’re just going to go ahead and watch for that, going to understand that there’s a new route that it needs to go ahead and propagate throughout the network, going to go ahead and communicate with other BIRDs in the cluster, and those BIRDs are then going to go ahead and create that route, locally, on the box. Once we get to this picture, we basically see basic routing and some basic functionality in terms of on-host routing or creation of routes, and also across multiple hosts, the distribution of these routes via BIRD. Some more questions that we have to ask, though, where do these routes get stored, where does policy come into the picture, how do IPs get handed out, how does BIRD know what nodes to distribute? That’s really where the scratchpad comes in. It’s etcd in Kubernetes’ case, though we’re not too hung up on what the data store is, we just need a key-value pair, but etcd is a great fit, and typically what is being used in Kubernetes clusters nowadays. So, etcd’s going to actually configure … We’re going to go ahead and configure etcd via calicoctl or calicoctl, that’s how we’re going to go ahead and create the IP pool, going to go ahead and push policies into Calico. There’s some overlap between what we can do, between calicoctl and kubectl. For example, all the policies can be applied via kubectl, so a lot of the [inaudible 00:11:30], the peering, the IP pools, the [inaudible 00:11:34], all of those sorts of things would be managed via calicoctl. So, the first thing that we really pull out of etcd, or out of the key store, via ConfD, is all of the nodes and all of the BCP configuration. So, BIRD is going to learn about every other node in the cluster from etcd. ConfD is a small open-source project that basically watched etcd, and can dynamically update the configuration in BIRD. So, if the cluster is growing, or reducing in size, scaling up or scaling down, etcd’s going to go ahead and reflect those changes, ConfD’s going to watch that, and push the changes into BIRD. So if BIRD needs to go ahead and propagate routes to new hosts, it can go ahead and do that. The other component that’s key here is in terms of the policy enforcement. You’re going to create policies that are going to say based on labels, this pod can or cannot talk to this other pod. That’s going to be stored in the key-value store, etcd typically, and we’ll go ahead and push that policy into the local Felix. And Felix is going to go ahead and leverage, whether it’s IPTables or EPBF in the latest version of Calico, it’s going to go ahead and leverage these two mechanisms, in order to go ahead and enforce these policies. So, one of the other things that CNI does for us is go ahead and push metadata into etcd. So, for example, once a pod gets scheduled, there’s going to be some very important metadata that we need to push back into etcd, so, for example, maybe the name of the pod, any associated labels, the IP, all the information goes into etcd, and is then leveraged by Felix in order to go ahead and do the policy enforcement. That’s really where the mapping of labels and IPs comes to bear, because, again, you don’t want to create policies that deal with static IPs, you want to manage policies that deal with labels, a much cleaner construct for handling policies within a Kubernetes cluster. There’s another aspect here that we hadn’t mentioned yet, and it’s how do we actually assign IPs? There’s another component called the IPAM, IP Address Management, which is basically you’re going to go ahead and understand what IPs are available, and assign them to the pods. The way Calico works, we wouldn’t want to go ahead and just grab IPs 1/32 at a time, because that basically … You can think about what that would do to our route table. The way Calico works is by default, we’re going to grab /26 [podsiders 00:14:35], assign that to a host, and IPAM is going to go ahead and grab, from that pool of IPs, an IP, and assign it to the pod. The Calico IPAM is actually pretty intelligent, in that if we were to for some reason exhaust all the IPs in the /26, the IPAM can either go ahead and grab another block, but if for some reason there was no more blocks to be handed out, no more /26 blocks to be handed out, it can actually grab an IP from other instances and assign it to any pods locally. One of the key features that we’ll go ahead and describe in a little bit is being able to manage the default size of these podsiders. Default is /26, there’s some pros and cons, or tradeoffs, better said, from going smaller or bigger, and we’ll discuss that in a little bit. But here, folks, is a basic picture of how Calico works. You’re going to notice there’s a few different components here, Felix, BIRD, CNI, ConfD. If you’re running Calico in your cluster and you do a QCTL get pods –namespace=kube-system, you’re going to see calico-node running in there, and if you were to exec into calico-node, you go ahead and see these different processes running inside there if you did a ps, for example, within the calico-node pod. So, the high level, this is basic functionality of Calico, there’s some caveats as you can imagine. For example, the VXLAN functionality is actually driven by Felix, but by and large, this is basically how Calico works at maybe 100 foot level. Let’s go ahead and move forward now, and … Let’s go ahead and discuss some of these new features from the last year, really. So, we’ve had a flurry of releases here over the last year, starting with 3.3 back in November of 2018, which seems like an eternity at this point. From 3.3, 3.5, 3.7, 3.8 just a myriad of releases with a lot of key features, and we’re going to go ahead and describe some of these key features now. The first one we want to go ahead and talk about is in-cluster route reflector support. This is really a scaling concept. If you were to just deploy Calico on your cluster, and you would have what we would call a full BGP mesh, you would basically have every host have a TCP connection to every other host. Remember, BIRD is actually having to propagate the routes to every other host, so if you have a hundred-node cluster, on any given host you’re going to have 99 TCP connections reaching out to every other host. At 100 hosts that may not be an issue, but if we get up into the massive scale, web scale, a thousand nodes plus, that amount of TCP connections becomes untenable. So, one of the key features being able to leverage reflectors to just have a couple of TCP connections to these reflectors, and then rely on the reflectors to go ahead and distribute the routes to all the nodes in the cluster. Really, anything maybe above a couple hundred nodes, we would fully invite our customers to go ahead and start thinking about ways to scale the cluster, typically done via a route reflector. Another feature which we alluded to earlier is configuring the block sizes. So, again, the default is a /26, that gets you X number of IPs to be able to assign to pods on a host, but you can actually increase or decrease that. If you increase that block size, it means there’s going to be fewer blocks available per host, and potentially fewer routes, but you would want to make sure that you have at least as many blocks as there are hosts. You would want to avoid the situation where you’ve exhausted block [inaudible 00:18:52], and Calico now has to borrow single IPs from other hosts in order to fulfill the pod allocations. Again, we need to think about this in terms of scaling. You can also reduce the block size. Potentially you would have more routes, but there’s a nice tradeoff that you would have these blocks fairly distributed amongst the hosts. For whatever reason, you may have some … Maybe pinning more workloads to a specific node, and you would be able to distribute these blocks more evenly across your hosts. Another interesting feature that we rolled out in 3.4 is advertising service cluster IPs to your network gear. So, for example, BIRD is obviously going to be distributing the pod IPs, but what we’ve actually done in this version, 3.4, is allow you to go ahead and start advertising service IPs. Typically, if you want to route to a service, you would need a node port and ingress to go ahead and route to these services. This allows for native Layer 2 routing and load-balancing across services. Obviously, tradeoffs in that you wouldn’t some of the features of an ingress. You wouldn’t be managing HTTP requests, for example, but I think this is a great feature if you really think about … If you’re trying to think about just basic Layer 2 connectivity, this is a great feature. Another feature here, go ahead and click forward, is host endpoint protection. Host endpoint protection is something that we’ve had with Calico for a while now, but what we did in 3.4 is basically allow you to go ahead and secure node ports from pods within a cluster. So, what this means is those endpoints is a pre-dnat policy. If you’re in a pod and you’re trying to access a node port, you could potentially lose sight of what the source and destination address is, but by making this a pre-dnat policy, you can actually secure traffic as they’re coming from pods within a cluster. So, additional security there. I see a question here, in terms of from a couple of slides back, how many route reflectors are recommended in, let’s say, a hundred-node cluster? Typically, you would have three route reflectors and you would peer every node to two of those route reflectors. This is just for high availability, and just HA really from a route reflector point of view. How are route reflectors get deployed, [inaudible 00:21:50] select a host? Route reflectors, typically, could be network gear much like you have a top-of-rack switch. You would have a network gear that would be … act as a route reflector, and actually you could leverage it as that. There’s also the possibility of having a software-based route reflector. You would potentially put that on one node per rack, the caveat being that you now have a node dedicated to a route reflector, so additional hardware there that you have to work with. Either/or are valid options. So, keep the questions coming, we’ll answer them as they come up. Moving right along, a version 3.5 feature is IP address allocation based on topology. So, for example, if you have multiple racks, or if you’re in AWS or a cloud provider and you have this notion of a node availability zone, it may be useful to go ahead and assign IP addresses based on the topology. This may be useful in cases where you have a firewall, not that we would advocate a firewall, but if you wanted to use a firewall, you could go ahead and stick a firewall between a couple of racks, and you could create firewall rules that would rely on the fact that a certain set of IPs is in one rack, and another set of IPs is on the other rack. This is one potential use case for actually leveraging network topology for the assignment of IPs. Another interesting feature in 3.6 is selectors with substring match. So, when we do policy evaluation, you’re going to get the basic equals, not equals, in set, not in set, but we extended the selector matching with a few different features here in terms of contains string “S”, starts with string “S”, ends with string “S.” This can be useful, for example, we have lots of customers with multiple deployment environments, a development cluster, a staging cluster, and a production cluster, and they want to create a label taxonomy that indicates by the label what environment it’s being deployed in. So, for example, they may have a front-end application with a label that says “frontend-dev” or “frontend-staging” or “frontend-production.” If you didn’t want to have to change the policies across these clusters, you could go ahead and create policies that leverage the fact that it starts with “frontend,” this label does, and go ahead and leverage that stars with mnemonic, in order to go ahead and reference those labels. So it just makes it easier to go ahead and propagate policies through environments if you didn’t want to have environment-specific labels. Support for IPAM with KDD. Typically, again, we’re running in Kubernetes clusters, and we want to leverage the API server in order to go ahead and push and pull policies into and out of etcd. Well, instead of you having to either leverage and manage your own etcd cluster, or have to plug directly into the etcd cluster, this allows for just much cleaner installation, frankly speaking, where you can just leverage that Kube API. One of the nice things about leveraging Kubernetes in order to talk to etcd is that you get secure communications out of the box, as well as rbac. So, just a much cleaner, simpler way, and really the way most folks are installing Calico in production nowadays. So then, in 3.7, we also rolled out native VXLAN encapsulation. We had IP in IP encapsulation, but there’s some networks that don’t allow for IP in IP traffic. I’m looking at you, Azure and DigitalOcean. So, for these cloud providers, we basically created VXLAN, and you can happily run that overlay on either of those cloud providers. So, if you have an APS cluster and you’re leveraging Calico, chances are you’ve already figured this out, and are running in VXLAN mode on those cloud providers. In 3.8, we have an IPAM utilization reports, view the IP address available, essentially, via calicoctl. So, for example, before this feature was there, if you had one IP available, or a million IPs available, you had no real easy way of being able to discern that. But, with this new feature, makes it super easy to just go ahead and get a quick report and understand how many IPs you have free in an IP pool. Those are some of the key features we wanted to highlight over the last year. Again, you think of Calico as the open-source project, and we thank you for using Calico, but if you really wanted to go ahead and productize or productionalize, if you will, Calico, fully encourage you all to go ahead and sign up for a demo on the website. We’re trying to attack the three different pillars here that most folks would need to attack. For example, zero trust network security, visibility and threat detection, and continuous compliance. We’ll pause here for questions, otherwise I’ll hand back to Michael. Michael: Thank you, Eddie. So, if you guys have any more questions, please feel free to go ahead and type them in. I know there’s a little bit of a delay on the questions actually arriving to us, so we’re going to go ahead and hold here for a couple. Michael: But while we’re waiting, like you guys all to know about our next webinar coming up, which is all about Kubernetes and Helm, and network security. This one will be a great webinar, it is being put on by Brandon Jozsa, a solutions architect at Tigera who actually wrote a version of Helm that is in Red Hat, co-authored. So, he’s very knowledgeable on Helm, and it’ll be a great webinar. Please do attend. Michael: Okay, so we had a question here, Eddie. Let me see here- Yeah, yeah, go ahead and take that. Yeah, so the question is related to network policy. Can I confirm if we use a Kubernetes API service data store, calicoctl’s not required to manage network policies? And, yeah, that is 100% correct. You could do a kubectl apply -f with your policies, and away you go. Feel free to use kubectl for the application of policies, calicoctl, you can go ahead and head over to our documentation on docs.tigera.io, or docs.projectcalico.org, and you’ll be able to see the reference information, in terms of the different things that you can do with calicoctl. It’s really based on managing some of the BGP peering. So, call it the nitty-gritty of the networking, would still be managed by calicoctl. So, great question. Yeah. I would say about Calico, there is four years of sweat equity in this product. Yes, you can do it yourself, but I think I would give Calico a try, because they’ve already worked through a lot of the issues that you may run into… That would be my advise Yeah, great. And then, I guess there was a second part of that question, in terms of [inaudible 00:29:58] resources [inaudible 00:29:59] instead to manage network policy. We would recommend you work through the YAML, the network policy, that’s a much easier way to go ahead and manage, think about YAML. At a minimum, you can put YAML in GitHub or something and start actually managing your policies through this notion of getops. So, how can I use Calico to monitor traffic? Calico, obviously, does the routing, obviously does the policy enforcement. If you really wanted to monitor the traffic, that’s where Tigera Secure would come into the picture. We have a nice GUI that helps you visualize the traffic. Also, we have all the flow logs, all that pluming set up, so we can actually grab all this network traffic through [inaudible 00:30:49] stack, through [inaudible 00:30:51], and actually do some visualization on top of that. Okay. Well, there are no further questions. The other webinar that I would recommend you guys all seeing is an on-demand webinar that we have on our website that we did with AWS and Atlassian, where it’s a case study. Where, essentially, Atlassian spends most of the time presenting how they approached network security in their move to the cloud, and to Kubernetes. It’s a really great informational webinar. If you haven’t watched it, please do go give it a watch. It’s a good one.