Four Ways to Accelerate Your Kubernetes Project

 

As the founders of Project Calico, we work with hundreds of teams every year to help them avoid obstacles and gain the most value from Calico.

We observe a common “Kubernetes Journey” that most infrastructure and platform teams progress through as they deploy Kubernetes to their organizations, and will share that journey in this webinar.

Sometimes we are pulled into projects on fire. Without guidance, many projects run into problems of scale, enterprise integration, and cross-functional alignment that can slow everything to a grinding halt. We’ve seen all these problems and can help.

That is why we created Calico Essentials – our solution to have you aligned with industry experts throughout your Kubernetes journey.

In this webinar, you’ll learn the four ways Calico Essentials can help accelerate your Kubernetes project.

  • Training and education for new team members and other stakeholders
  • Best practices workshops on network and network security design
  • Help you operationalize Calico to work with the rest of your tools, infrastructure, and processes
  • Troubleshooting strategies, tips, and tricks

So today we’re going to go through two different topics. The first is the Kubernetes adoption journey. Now it’s Tigera. We’ve created project Calico and we do interact with a lot of the community. We have approximately 250 to 300 different meetings per year with Calico users across all different phases of their journey. And we’ve really identified some themes, some common trends across these accounts. And I’d like to tell everybody what we’ve discovered. And then I’d like to talk about four ways that you can accelerate that journey. So the enterprise Kubernetes journey that we’ve observed is comprised of five different stages. The first stage is basic education. So in this stage you’re learning about Kubernetes, you might be learning about some of the different components, how networking works, what pods are, and labels. And you may be looking into things like network security, specifically network policies at this phase. And really what this is to do is to decide whether or not Kubernetes is for you. Now when you get through this stage, and if you decide that Kubernetes is something that you would like to pursue, then the next stage that we enter is generally a lab or a sandbox stage. And in this stage you get Kubernetes installed up and running. You get to work in cluster, and oftentimes at this point in time, this is when you start running some sample workloads and get to know how Kubernetes is working. And you also, at this point in time, will generally start to design an operating model. So if we do deploy this, how are we going to manage it? Who’s going to be involved? And so that’s phase two, and this is generally in stage two you’re really working typically as one team. So oftentimes you see this might be one or two different DevOps engineers or SRE. These are oftentimes, if it’s over in the IT organization, it could be a platform engineering team, but stage two is usually just isolated within one team. Stage three is generally to run a pilot application, generally preproduction type application. Most of the time these are going to be some form of a back office app, something where if the application goes down, it doesn’t create tremendous impact on the business, and at this stage is a few different things that change. One is that you’re now generally interacting with more teams, and so if you’re a platform team who set this up, you’re probably now working with a DevOps team. In some cases you may be working with the security team, you might be working with a networking team. But in stage three what you’re doing is you’re implementing that design that you had created in stage two and you’re trying to get this to operate within the rest of the enterprise. And generally the outcome of stage three, if you have an application that’s been running and you’ve been able to have good up time, you’re able to prove the value, this is generally when we see most organizations or most initiatives going out for budget saying, Hey, look, Kubernetes is a thing. We want to do this. We’d like to secure a budget, here’s how much we think that’s going to cost and the Kubernetes initiative becomes relatively formal at that point in time. So then you enter stage four, and in stage four what we generally see is that one single application is usually rolled out to the Kubernetes cluster and run in production. And in this stage what happens is, you really start to flesh out the operating model, meaning issues will come up and you’re resolving those issues and you get what I call a muscle memory related to managing Kubernetes and having multiple people involved in doing that. And there’s also typically some alignment that you’d have to have since you’re running in production, you’re touching production networks. You might be touching even a DMZ, so these applications might be exposed over the public internet. You’re going to have to start working with a lot of additional teams, and this is really the stage where when you have success here, there’s generally some pretty big internal promotion and all the other application teams are going to want to get onto this platform, which is when we see stage five, which is where you’re rolling out multiple enterprise applications. So this journey seems consistent across the hundreds of people and hundreds organizations that we’ve worked with. This really seems to be the theme and most folks that they’re implementing Kubernetes will fall within one of these different stages. Now what I’m here to talk about today are some of the pitfalls that you have as you progress through this journey. And then I’m going to highlight four different ways that you can avoid some of these pitfalls. So one of the first things is networking. Typically not the very first thing you think about with Kubernetes. And oftentimes it’s something you think about a little bit too late. But the network is a really critical aspect of Kubernetes. With traditional applications that might’ve been on VMs, they make relatively little use of the network. But in Kubernetes, everything communicates over the network, especially if you’re designing a microservices architecture. So if you don’t put considerations in network, it can create some issues. As an example, the way that one certain distribution of Kubernetes might run on AWS the network would work a certain way, versus another distribution of Kubernetes, and it could work differently as well on other clouds, and then the network is completely different on prem in your data center. So if you don’t really put some thought into, well where am I going to run my clusters and at what scale? The networking can become a really big problem down the road. So one of the suggestions here is to think about that. Also getting your label taxonomy correct. You really want to have a standard way that labels are being defined on your pods and other resources within Kubernetes. And if you don’t get that right upfront and you don’t define that as a standard, when you get other people into your cluster, they won’t be following any standard and that can create some challenges for you down the road. Also at some point in time, especially when you start the rollout, some of these initial applications, they will likely need to connect to things that are outside of the cluster. It might be a database that’s running out in your network. There might be other applications that it relies on, it might be file storage that it has to use. Whatever that is, you do need to now integrate your cluster with your existing network and with your existing applications that are running in that network. And there are some complexities there. And then finally as you progress through each stage of the journey, every single stage you’re going to start interacting with more teams. At some point you’re going to be working with a network engineering team. We can be working with the DevOps team, you will be working with the security team, and potentially even a compliance team. So these are the areas where generally a Kubernetes project will stall. Because if you don’t get these things right, you have to go back and do things over again. So that said, I’m going to talk about four ways that you can accelerate the Kubernetes journey and reduce the risk of those stalls as you’ve progressed from stage to stage. So the first is training the stakeholders. So there’ll be multiple people who are involved in the project. So generally when you start at stage two and you’re setting up a lab in a sandbox, oftentimes this is an infrastructure team is really focused on this. When you move into a pilot and pre-production stage, now there’s a networking team. Now the networking team probably doesn’t know anything about Kubernetes, and you’re going to need to be able to train these people and help them understand what Kubernetes is and what the implications are, why you’re doing it, and how it’s going to need to access the network. In stage four. Now you’re running a production application. So the security team will be involved. And the security team also generally doesn’t really understand Kubernetes, and that can create some headwinds for you. And then stage five, when we have multiple applications, some applications will have compliance requirements. Generally compliance is something that applications would have if they’re dealing with customer data, or if they’re dealing with customers that are outside of your network. And many companies just have their own internal compliance requirements in order to maintain their security posture. So upfront, if you sit down and you really put together a training plan, this is going to help accelerate your journey. And the fact is, is that you’ve gone through all your education, you know what Kubernetes is, but others don’t really know what that is. And it is quite a bit different from traditional infrastructure. The learning curve is quite steep and this is especially an issue if they don’t really have the time right now to invest in learning. And the other thing is that the longer it takes them to learn, the more issues that they’re going to find, and the more different constraints and objections that they’re going to have or different issues that you’re going to have to work through. If you have a training plan and if you’re able to get out there and execute on this quickly and educate these folks on the right things that they need to know to be very efficient about it, it’s going to significantly increase the speed at which you can deliver your Kubernetes project. The second aspect is to put some good amount of time into building the right architecture. And the reason why is that if you do build the wrong architecture upfront, as you start to get more and more people who are involved, oftentimes what you have to do is basically go back re architect the cluster, basically burn everything and then build it back up. We hear this happen a lot, and nobody really wants to go back and do that rework. And not only does it slow down the project, but you start to lose faith from others around you that the Kubernetes is really the right thing to do. A first impression is typically the impression that they keep. And it’s very, very hard to change that positioning in their mind. So the common causes we really have for rework are the networking choices not really taken seriously upfront. You do need to learn about the different networking options, the different combinations and permutations of where you run Kubernetes and how that’s going to work. Because if the network’s unreliable, the applications just simply are not going to work. Incredibly challenging because you really just don’t know what’s out there, you’re going to have to take inventory, you’re probably going to have to create the standards after things are already running in a cluster. And that can be very painful process. And the other is if policies are not implemented soon enough, and this is really, really common. So the initial stages of rolling out these applications, people don’t use network policies. And as a continuum progressed to their journey, they’re now working with networking teams who have specific segmentation requirements. They’re working with security teams who have specific security controls that need to be implemented. Well, if you’re thinking about these things, after all these applications are running with all the random different labels, it becomes very, very challenging just to understand what is the behavior of that application, and how do we even write these policies. And that that can really be circumvented if you do some of the work upfront and just start adopting the policies for the applications that are running. So this is upfront planning that can help avoid these issues. Kubernetes is relying on the network, and it’s going to work differently across each one of those combinations and permutations of different cloud environments. And so make the right decision up front. The final label taxonomy, that’s going to scale. And now think about this as being like hundreds of applications, or thousands of applications. If you frame the problem that way, it will help you to design a label taxonomy that from the early days will work just fine, but as you scale you don’t have to go back and really reinvent the wheel. And a suggestion here as well is implement policies early, when the connections between the microservices and pods are pretty well defined and known. The third is enterprise integration. So most applications are not really an Island. Applications need to communicate with resources like databases with other applications that are probably not living inside of the cluster, and that’s going to require you to integrate with the existing network and each network’s a little bit different. If you happen to be deploying this into a data center, that networking can be incredibly complex. You could be integrating with things like Cisco ACI devices and top of rack switches and these are pretty complex problems to solve. And to really reduce all those headwinds, what you can do is really upfront, do some integration planning. So meet with your networking team early, get them educated on how Kubernetes networking works and what you’re going to need and then work with the team to identify how you will integrate with the network. Are you going to use something like BGP integration with the network? BGP peering, do you need to integrate with top of rack switches? And also you’re going to want to get approval for the actual CIDR block that’s going to run inside of your Kubernetes network. And there’s going to be some process there in terms of defining, well how many pods do I think I’m going to be running? And if you get that number wrong, it can be a little bit painful to go back and to redo that. And the fourth is in terms of resolving problems that could happen within the cluster. So the newest element of the distributed application that’s really hard to debug is the network. And that’s just because the network is used so heavily, it hasn’t typically been used as heavily in the past for traditional applications and traditional infrastructure. And the thing is, is when there’re networking problems that are going on, network engineers oftentimes don’t really understand Kubernetes well enough and they certainly don’t have the right tools to go in and debug what’s going on. And if you do experience these kinds of problems and downtime it’s very expensive. And again, this can really block your project if people start to lose faith. So the best thing to do here is in the early stages when you’re trying to figure out, well how am I going to operationalize my cluster, is create runbooks. So the Kubernetes experts that are working on this project and rolling it out initially need to document how do you investigate debug problems and define these as run books. And what that’s going to be able to do is to train the operations staff on how to execute those runbooks in a more [inaudible 00:18:10] fashion to try to identify those issues. And generally the ops team is also going to need somebody that they can escalate to. If that run book’s not working, you’re going to need to have a local expert on your team who can just pop in and help resolve these issues that the runbook’s do not. And so this planning is something that you can do completely on your own using these four different steps and really thinking about this upfront, building out the right program and plan will help you accelerate that journey. We are also here to help and what I’d like to do is spend a couple of minutes introducing a program that we have called Calico Essentials. The whole focus of Calico Essentials is to help you [in 00:19:05] extension of your team and to help you accelerate your journey to get all these plans, all the documentation done right upfront, so that you don’t hit these headwinds along the way. So our mission with this initiative that we’ve launched is really to accelerate Kubernetes journey. We want you to be successful at every stage and we have a core value here at Tigera where we say that the customer is the hero of our story and we live by that. We really want to make you the hero. We want to help you get through this. We want to make sure that you don’t run into the bumps and the issues that caused other people to lose faith in your project. And we want to just help give you the leverage that you need. So in this case, you can use Tigera as an extension of your team to really help you move this project forward, get that guidance and get all the right resources at each step of the way. Now Calico Essentials is of course built around project Calico. And project Calico is the best networking and network security open source product that’s available on the market today. It’s blazingly fast. It uses the native Linux data plane, it has your bare metal performance. It’s by default, a layer three network. It doesn’t use encapsulation. And so what that means is that it’s a lot more simple, so it’ll be a little bit easier to use and debug. It’s massively scalable. So it’s built on top of BGP, which the routing protocol of the Internet, the Internet’s quite scalable. We scale in the same way, and we can really scale to the limits of what Kubernetes can do. In terms of network policy, we are the industry standard. So if you go out there and you use any of the managed Kubernetes offerings, if you’d like to use EKS or AKS or GKE or Anthose, even the IBM, IKS, they all embed Calico within their offerings for our network policy features. And you can find that documented in any of the documentation on your Kubernetes engines. Just go search for a network policy and they’ll tell you how to get the Calico policies running. We’re also embedded in many of the distributions of Kubernetes that are out there. So if you’re using something that’s not a managed Kubernetes, it’s your own, you’re probably going to have Calico integrated into that. If you do a vanilla Kubernetes, you get to make your own choice. And we’d hope that you would make the choice of project Calico given that we are the industry standard out there. And the other thing is that Calico’s and running for a very long time out in real world production clusters and very, very large financial services organizations, the largest SAS companies in the world, very, very large telcos are all running this software and they rely on it. Currently today we have about 150,000 known clusters that are running Kubernetes. I’m sure there’s many more that we don’t know about. Project Calico is the way to go. What Calico essentially does is takes our experts. We’ve created Calico, we live and breathe Calico every single day. You get access to our experts as an extension of your team. And what we’re able to do then is to help with the on ramp from stage four to production. And I apologize, there’s some formatting issues with this due to the webinar platform that we’ve selected here. But what we do is when you hit stage four, and you’re really going out to production, we have a software product called Calico enterprise that provides some capabilities and makes it very, very easy for you to work with your security teams and compliance teams. And we give you eval to give that a try. Now, what’s really critical is, we’ve mentioned that across each of these different stages of the journey, there’s new people who are starting to get involved. They’re going to need to be trained, and we offer training. And so this is a service which is a yearly subscription. So we can just align it to exactly when you need to train. We can come in and we can train your team. We do this virtually, so if you have distributed teams, they can join in on these training sessions from wherever they are in the world. And then we do design workshops, so if you’re trying to figure out how do I build my label taxonomies, how do I build the right architecture, how do I select the right networks, how do I integrate? We provide all the best practices through these workshops and again is not something that you just take once up front. You’re going to need different workshops given where you are within that journey. And so this is something that you get which is really incentive for you to accelerate the journey. The process that we use is really three different stages. The first is that we help you get the right design put together. We use our workshops in order to do that, you may want some specific training as well for some of your core team that’s working on this project, and then we work on enabling both your team as well as the other teams and we do this through training and we do this through additional best practices workshops. That are really aligned, maybe not necessarily just for the core team, but then for the extended teams to help them get the right architecture implemented and then we support you. So you know you can file a support case, you can get some technical help from us, you can escalate issues if you’ve got something that’s really urgent, and we do regular sync meetings with you just to make sure that we’re in alignment and then we’re helping you accomplish your goals. This is just a sample of a training catalog, and what you’ll notice here is over on the second to the right column over there, the audience. Some of these trainings are really designed for a platforming team, but you’ll notice that there’s also quite a bit that’s designed for network engineering, security as well as DevOps. And we have a variety of different levels for that training. And training catalog is something which we’re constantly really working on based on feedback that we get and the needs of our customers, but these tend to be the ones that are most popular today. And we also have hands on workshops, and so at these workshops generally the output that you’re going to get some kind of a document and Vizio diagram, your standards. It just depends on what the workshop is and what the goals are, but this is something that you can call on us at really any time when it’s the right time and we’ll come in and we can help. And I did want to give a a quick little promotion as of what we call Calico enterprise. Calico enterprise is really intended for stages three, four and five of your journey. Calico Essentials is really one, two and three to accelerate and let’s get everything all set up our architecture, we’re ready for scale. But when you start to work with more and more users and more and more teams and you have a lot more security requirements and integration requirements that happen just because of the number of applications, Calico enterprise is really designed to help you get through some of those hurdles. Calico enterprise gets you a web based user interface, which makes it a lot easier than dealing with the ammo files and [inaudible 00:27:27] commands. We give you network visibility, meaning you get basically a log of all network connections that were made, what policies were evaluating those connections, whether the policies accepted and denied those and troubleshooting tools. So if you do have a connectivity issue, we want you to be able to identify the source of that issue within a minute and then after identifying the source of the issue, be able to resolve that within another minute, and we provide the tools that really help you do that. The other thing is a policy workflow. So working with policies is not simple. It does take some time and they can be quite delicate as well. It’s a common thread, a policy that’s done the wrong way. Perhaps you fat finger something in there. Suddenly it’s not implemented properly or maybe you just brought DNS down within your cluster, we hear these stories. Policy workflow can help you by either generating a policy for you, and that would be based on just the current behavior and the networking that your application is doing. It can give you the capability to preview any stages to say, Hey, if I made this change, what would happen? So you can see that, Oh, I made this change but I’m actually impacting some service that I didn’t intend to impact. You can do that before you roll it out. It also offers something called a stage mode, where the network policy is running in a promiscuous mode. It’s telling you what it’s going to do. So it’s not actually enforcing the traffic. But what it will do is it will say what it was intending to do. And if you run a policy in a stage mode for a period of time, things are looking good, you can go and you can commit it. So this is more of a workflow that’s intended to ensure that changes that you make in your live cluster are not going to have an impact on anybody else. We get a fine grain control outside the cluster. And so if you had, for example, a database you need to connect to, there’s not really an easy way to do that with Kubernetes using networks policies. So we introduced the capability to have one single pod connect to an FQDN, a fully qualified domain name that includes wildcards. We also offer compliance tools, some advanced threat defense and federation as well. If you’re working across multiple clusters, you probably don’t want to design these policies over and over and over again for each cluster. You can define a set of standards and you can have those just federate across your clusters. This product, if you’re interested in it has a completely free trial, it’s a hosted trial, so you don’t even have to have your own infrastructure to use it. You just go to Tigera dot is slash trial. You can give it a try. So I just wanted to rap, saying as you’re working on this journey, if you go at your own and you decide that you’d like to define the training to define the architects here, to get all these things set up up front and built into your project plan, then one of the things that you can do to get free help from us, is to join our Calico users Slack group. The engineers who have built Calico are in there. There are thousands of end users who use Calico, who are in there, and it’s a great resource to go in and just to get help from the community. And if you are interested in learning a little bit more about Calico Essentials and see if that’s something that will help to accelerate your journey, we’ve made it a very affordable package and this has really been intended for folks who are just getting started in Kubernetes and don’t really have a big budget. You can contact us at contact at Tigera dot io and we’ll get back to you and we can share more details and make sure it’s in alignment with your specific project.