Network Security and Compliance for Kubernetes on AWS with Atlassian and Tigera

Securing Atlassian’s Journey to the Cloud

Learn how Atlassian transformed their traditional, monolithic application architecture into microservices on the cloud, in a secure and compliant manner with AWS and Tigera.

Watch this on-demand webinar and learn:

How Atlassian detected and stopped a bitcoin mining abuse in less than 15 minutes
How they leverage AWS, Kubernetes, and Tigera security policies to create a defense in-depth network security posture
Best practices for planning the migration of applications to a containerized, microservices architecture

Complete Transcript

Carmen Puccio: Hi everyone thank you for attending today’s webinar. The webinar will begin shortly. Hi everyone and welcome to today’s webinar, Network Security and Compliance for Kubernetes on AWS with Atlassian and Tigera. When you join today’s webinar, you selected join either by phone call or your computer audio. If, for any reason, you would like to change your selection, you can do so by accessing your audio pane in your control panel. Carmen Puccio: Also, from this control panel, you have the opportunity to submit questions to today’s presenters by typing your questions into the questions pane. We will collect these and address them during the Q & A session at the end of today’s presentation. If, for any reason, we could not get to your question, we plan on responding to each of you personally through email. Carmen Puccio: The decks will be available through SlideShare, along with the recording of the webinar, two to three days after the conclusion of this presentation, so keep an eye out for that email. With that said, my name is Carmen Puccio, I’m a principal solutions architect here at AWS, and I’ll be your speaker and moderator for today’s webinar. I’m joined with Karthik Prabhakar, from Tigera, and Matt Whittington and Corey Johnston from Atlassian. Carmen Puccio: Okay, so let’s get started. Modern application development. When we talk about modern application development, we’re essentially talking about the new normal. Meaning, more and more companies are increasingly global and products are completely digital. Cloud, mobile, big data, social technologies, have really impacted application development. They’re building new digital disruptions and forcing change even faster. Carmen Puccio: This is really the new normal, and this means that application leaders must splice, essentially like a digital DNA, if you will, into their culture, processes and technologies, to transform into top performing organizations. With that, there’s this need for rAPId innovation. To be competitive, you really need to innovate as rAPIdly as possible.d when it comes to this innovation, and going as quickly as possible, and trying to stay competitive, we talk about modernization. Carmen Puccio: These days, modernizing your business applications, it’s really an inevitable aspect of doing business. Businesses want to drive growth, they want to accelerate their migrations with repeatable processes and patterns, and they really want to maximize the value as they modernize these applications. Because at the end of the day, it’s really about speed and agility for the digital business. The faster they collect and analyze this feedback, the quicker they react for their customers, and the more they can reduce their costs, the more they can experiment on new ideas. Carmen Puccio: The faster those ideas get to market, the more successful they are. At the end of this, it’s like, what does it look like in real life and what are some options for you to modernize an application? Containers, right? Containers to the rescue. When it comes to how customers can accelerate application delivery, while reducing costs, a very common approach nowadays is to leverage containers. The container story is especially compelling, because it allows customers to be more effective. It allows them to be more agile. Carmen Puccio: Containers provide that option to deliver applications quickly, for the development and operations teams who are under pressure to adapt to the growing business needs. Businesses know that getting software products and services to market fan, if you have faster, translates into that gained market share. Using containers can help enterprises modernize these legacy applications, create cloud native net new applications, that are essentially both scalable and agile. So container frameworks, such as Docker, provide that standardized way to package applications, including the code, the runtime, the libraries, and to run them across the entire software development lifecycle, regardless of the environment. Carmen Puccio: Gardener predicts that, I think it’s in 2020, more than 50% of the global organizations will be running containerized applications in production, which is up from 20% today. So the benefits of containers, right? You’ve heard about them. It’s a hot topic, right? So what are some of the benefits? The first one is this defined as code. Carmen Puccio: So containers package your code, with the configuration files, and the dependencies, that the application needs to run consistently in any environment. When it comes to microservices, containers provide that process isolation. It makes it very easy to break apart and run applications as independent components, called micro services. Modern applications, that are shifting to cloud native and micro service architectures and containers, are a perfect way to encapsulate these individual pieces into portable units. Carmen Puccio: Automated testing and deployment. So development methodologies are essentially really changing, with developers shifting to this agile method, or devops, and leveraging continuous integration, and continuous deployment systems, or CI/CD. Because containers are lightweight and portable, they make great vessels, if you will, to wrap code up in order to push them through these new software pipelines. Carmen Puccio: The monitoring aspects, containers provide process isolation, as I’ve said before, that lets you do things like granularly set CPU and memory for better use of compute resources. Also, with that said, it’s important to think about your overall application level monitoring and how you Will ensure the overall health of your application as your containers spin up and down in the dynamic environment? Then, lastly, but not least, is the security and compliance controls. Because containers help enforce an infrastructure as code mindset into your application lifecycle, another benefit is at the security level. Carmen Puccio: You have the ability to set, and version, not just the container itself, but think about, maybe you want to set versions around the packages that make up your application. If you combine this with processes like vulnerability scans of the container image, and incorporating automated checks into your continuous delivery workflow, your security teams, and your developer teams are about to start working together to ensure a high level of security for your containerized application services. I went one too fast. Carmen Puccio: So Amazon container services. Since 2014, AWS has launched more than 50 new features and services to help developers better create and manage containers. The mission at AWS is to make it the best place to build and run your modern applications for containers. The goal is to remove the differentiated heavy lifting of your teams, to basically have the ability to give new ideas off the ground and out the door, as well as making it easy to optimize your existing applications. Carmen Puccio: Customers love AWS container services, because it not only provides the broadest set of capabilities for them to manage their containerize applications, but it’s also the easiest place for them to run containers. Our services are deeply integrated with the AWS cloud. So you think about things like IAM integration, security, VPC networking, load balancing, all surrounded by your devops workflow, and your developer tools. It’s a compelling story why customers like running containers on top of AWS. Carmen Puccio: Then, when it comes to the foundation, and having this as part of your application lifecycle, a lot of times, people talk about the monolith versus microservices. Think about the monolith, it’s essentially that thing that’s been sitting in your data center forever. You might not necessarily know what it is, or the people that built it, but don’t necessarily work here anymore. But you really want to think about how you can take that aspect of your application, and you want to think about how you can move towards a microservices architecture, because it’s really where micro services help you build and deploy faster. Carmen Puccio: You aren’t running in that tightly coupled single service, like you would be in that legacy monolith. People talk about how the cloud enables you to fail fast and quickly learn from your mistakes as you experiment. The easiest kind of innovation is the kind that is to drive incremental change with lots of experiments. The meaning, it’s much easier to do, essentially have the ability to wait for those iterations, so you can learn faster. Carmen Puccio: You want to go as quickly as possible, so it’s okay to fail. It’s okay to learn from your mistakes, because, essentially, by putting these pipelines in place, and having the ability to do a code check-in that will kick off a new environment, it allows you to learn from your mistakes, but then quickly iterate and then deploy net new versions of the application right. If you have your teams that they’re staffed appropriately, that understand the pipelines, and understand what it takes to actually push these aspects of your application out into your environment, it really helps them make decisions very rapidly and it allows them to, essentially, put in place patterns that allow them to move these applications from that legacy monolith into a micro services based architecture and essentially be a lot more scalable, and I’ll be able to react to change at a quicker pace. Carmen Puccio: So when we talk about monitoring at the end user experience, it’s really important. I always talk about this, regardless of where you’re running. If you’re running in a legacy environment, if you’re running on top of EC2, if you’re running in containers, monitoring is always an important story. Because monitoring is that part where you’re you’re maintaining the reliability, the availability, the performance of your application. You need to think about collecting data from all parts of your application, and you need to think about how you can easily debug a multi-point failure if one occurs. Carmen Puccio: You need to think about, and you need to start answering questions, again, regardless if this is a container or not. But specifically, like, what are your goals, and what resources will you monitor, and how often are you going to monitor those resources, and maybe what tools are you going to use, and who’s going to actually perform the tasks, and who’s going to be notified when something goes wrong? Very important questions you need to think about when monitoring your modernize applications or even your legacy applications. Carmen Puccio: Then, when it comes to the CI/CD process. People have been throwing the term CI/CD around for a very long time now. But when it comes to it, just to sum it up, it stands for continuous integration and continuous deployment. Customers who are modernizing their applications, see the tremendous benefit, and tremendous value, when incorporating CI/CD and devops practices into that application lifecycle. By incorporating automation, and really putting that cultural change of learning on how to integrate frequently, customers are able to address those bugs quicker. They’re able to improve software quality and reduce the time it takes to validate and release new software updates. Carmen Puccio: Then, when it comes to security, again, just a touch on this, the security of your application, you really want to think about how you can secure the application, but also how you can incorporate automation and detection through the entire deployment process. You want to try and find, or put something in place to address blind spots, without slowing down innovation. Because the innovation piece is key, as I was talking about before. Your security teams, your operations teams, and your developers, need to realize goals. Carmen Puccio: Things like, if you think about it, it’s like how do I embed security knowledge into my devops teams, so they can secure those pipelines, so they can design and automate. Right? Then, at the same time, they need to learn how to embed, essentially, the application development knowledge, and automated tools and processes, into those security teams, so they can provide that security at scale in the cloud. Because the end goal is, essentially, that you want this application to run, you want to ensure that the solutions are only working as intended, and you want to make sure that you have the smallest blast radius as possible. Carmen Puccio: Really, you want to think about how you can do that through automation. With that said, I want to kick it over to the Atlassian team. We Matt and Corey from Atlassian and they’re going to tell you a little bit about their story of securing their path to AWS. Corey Johnston: Thanks Carmen. My name is Corey Johnston, I’m the Kubernetes platform team lead at Atlassian. Next slide please. At Atlassian, we’re all about unleashing the potential of every team. Our team collaboration and productivity software helps teams to organize, discuss, and complete shared work. Teams of more than 138 thousand customers across large and small organizations, including General Motors, Wal-Mart Labs, Bank of America Merrill Lynch, Lyft, Verizon, Spotify, NASA. Corey Johnston: All these use Atlassians project tracking, content creation, and sharing, and service management products to work better together, and to deliver quality results on time. Continuous integration and continuous deployment is an integral part of the software development lifecycle. Naturally, it’s a big part of how we build software. It’s also a service we offer to our customers too. Corey Johnston: Internally, Atlassian uses Bamboo when they continuously build and deploy their products or services. We’ve got a dedicated team, which is responsible for running the Bamboo build to the entire company, making it trivial for other teams to set up plans to build and ship their apps in a compliant way, with little friction. To give you an idea of scale, we do About 88,000 builds a day. We also offered builds via pipelines to the public, where any developer hosting their coding Bitbucket can quickly host a build pipeline from a simple Yama file to continuously integrate, build, and deploy their code. Corey Johnston: While this is a fantastic service for developers, at the end of the day, it’s for platform administrators, like my team. This represents arbitrary code execution, because we’re giving users how to do anything in these build environments. As such, running services like this on a multi tentative platform, like Kubernetes, means we take security really seriously. We’ll have invested a lot of time and effort into hardening our platform and limiting the blast radium from any potential attack. Corey Johnston: Our team is a Kubernetes infrastructure technology team, or KITT for short. As you can see from the picture of the Hoff and our name, we’re themed around Knight Rider. Our team’s big, hairy, audacious goal, as we like to say at Atlassian, is to ultimately host 95% or more of all of the compute workloads of Atlassian, running on our containerization platform over the next three years. We need to support a really diverse workload, consisting of batch jobs and persistent services. Prior to our team forming, there were a bunch of smaller product related teams, each trying to run and manage containers themselves. Our was essentially spun up to duplicate this effort and to offer Kubernetes as a production grade service, so other teams don’t have to do it. Corey Johnston: We’ve got some really important requirements. First of all, self service. Again, we want to unleash the power of every developer, so it needs to be trivial for developers to run their containers without needing our team to set them up each time. Users shouldn’t need to touch or see the underlying hardware. We provide a redundant, multi-region platform, and they can run on our stack wherever there is an Amazon presence. Corey Johnston: The platform should do the heavy lifting for them. Things like authentication and authorization, logging, monitoring, compliance, auto scaling, cost visibility and optimization. This should all come out of the box. There’s ultimately nothing for the users to administer, upgrade, or maintain. We want to get their services up and running as quickly as possible with no friction or red tape. Corey Johnston: Looking back, since we spun up the team back in 2016, there have been two goals we’ve focused on. First of all, in 2017, we focus on migrating our job base or batch workloads from other container technologies across the Kubernetes. This is predominantly CI/CD based workloads driven by the Bamboo or Bitbucket pipelines. Then, in 2018, once we had this workload moving on Kub, we’ve given it to services workloads specifically, but we’ve been working closely our with our PaaS team, our Platform as a Service team, to help them to build the next gen of PaaS, called micros, directly on top of Kubernetes. Corey Johnston: Up until that point, our PaaS essentially consisted of scheduled topic containers running on dedicated EC2 instances, with every service behind its own ELV and having an ASG of its own. It worked really well, but it’s not as efficient or fast as we’d like it to be. let’s look at some of the challenges faced. Where were we three years ago? Pre-AWS, our team used to run a customized on-prem container platform out of data service. Corey Johnston: It was based on OpenVZ and a lot of silicon mechanics infrastructure in North American data centers. Now, three years ago, we had some serious obstacles preventing us from scaling. Firstly, a monolithic application architecture involved running a separate container that each customer, with his own database instance, and a separate copy of each application that our customers wanted to run. It wasn’t efficient. Corey Johnston: During off peak periods when our customers would sleep, their containers and the underlying physical infrastructure stayed online, and it couldn’t shrink, and it couldn’t grow. Upgrades were really hard. We developed really sleek upgrade processes, but at the end of the day, we needed downtime to physically stop each customer environment, upgrade the instance, and restart. We optimized the process of the original four hour outage down, but we were still ultimately scheduling rolling upgrades for hundreds of thousands of copies of applications. Corey Johnston: As you can imagine, this doesn’t scale and it’s heavily prone to error. This architecture meant that whenever we wanted to grow, our team would physically need to deploy new racks in data centers. As you can imagine, provisioning typically took two to three months to negotiate contracts, organize floor space, lay cabling, network links. Then, install racks into the data centers. This meant we’re limited to managing a handful of physical sites. Corey Johnston: We recognized that if we wanted to grow our platform to cloud scale growth, we’d need to switch to a cloud-native, multi-tentative application architecture, where a single instance of our apps could support thousands of customer workloads. A true SaaS based architecture which would remove these choke points I mentioned. Then, make it faster and cheaper for us to scale as our business grew. Let’s pause and take a look at three of the big challenges we faced when implementing Kubernetes. Firstly, in Kubernetes clusters, there’s a lot of east-west inter cluster traffic between pods. Corey Johnston: Traditional hosts or network based security models and longer sufficient for these dynamic flows. Where pods are [inaudible 00:19:23] and often come and go within minutes. Clones need to be granted and denied with much more granularity. For example, at a pod or at a namespace level. Kubernetes enables Atlassians platform administrators, working with our security team, to define an base line cluster security policy, including, for example, default denies between neighboring tenants, restricting access to privileged ports, such as the public port or the AWS metadata service, but allowing access to authorized, shared services, such as clustered DNS. Corey Johnston: End users then have the ability to grant or deny additional flows on top of this by layering their own custom policy on top. Having the capability to define and enforce policy is just the start. Today, we manage more than 20 clusters, the largest being a fleet consisting of about 14,000 BCPU and 50 terabytes of RAM in peak, several hundred of Amazon’s biggest boxes. With this many machines, it’s important to be able to deploy changes quickly and transparently across the entire fleet. Tigeras technology updates each hosts IP tables virtually instantly to enforce the policy we’ve defined. Corey Johnston: It then handles the update. We can push changes across the entire fleet in minutes. Kubernetes is inherently about multi-tenancy, rather than having dedicated servers for each bespoke application. It’s about leveraging tools of hardware and multi-tenanting diverse workloads across them. Doing this in a secure way requires a tenant’s pod to run side by side on the same hardware, to minimize costs and to consolidate hardware. Corey Johnston: We need to ensure they cannot access neighboring workloads processes or data. That they’re isolated, in other words. In this way, our workload’s quite a bit different to other clusters we’ve seen, where there’s an inherent trust between pods. In our cluster, there’s often no trust. For example, with Bitbucket pipelines, enabling our customers to build and execute arbitrary code, it’s supremely important that customers can only access their own resources and stay within the confines of their own sandbox. Corey Johnston: We need to ensure that the various factors can’t break out to the underlying host, to the cluster, or to other customer pods. Let’s take a look at our solution. Firstly, we needed a cloud provider that had a global footprint to enable us to stand up a local application stack in regions to minimize latency for the customer. We’ve gone to having just two North American pods to having additional sites in Asia, Pacific, and Europe to make our apps more responsive. Corey Johnston: We needed a provider that went beyond basic infrastructure as a service and provided a rich portfolio of managed services, such as RDS, Dynamo, ElastiCache, so that we could stop having to run these services ourselves. We needed a partner that could scale, so we could run down our infrastructure during the quiet periods I mentioned before, and ramp it back up really quickly when our customer’s workloads increased. This enables us to offer competitive pricing for our cloud products. Looking back, broadly speaking, our journey into the cloud can be broken into two phases. Corey Johnston: Firstly, we re-architected of our cloud products to more native, multi-tentative, SaaS architectures leveraging containers, which we wrapped up around 2016. Then, shifting our products across the docker, running on EC2 and leveraging auto scaling. This immediately helped us to scale faster. Then second, in 2017, once we’ve moved our workload to docker, then came the shift to Kubernetes and optimizing the way that we orchestrate and run containers. They needed to be both faster for developers and cheaper to run. Corey Johnston: Why Tigera on AWS? Well, when we started building a platform to support our big, hairy, audacious goal, as we like to say at Atlassian, around 95% or more, of Atlassian’s entire compute workload on our cube container platform, we realized that security would be a huge focus for a team. One of the key design tenants for securing our platform is around defense in depth, or incorporating multiple layers of defense to secure our network, and to minimize the blast radius of any potential attack. Corey Johnston: When implementing defense in depth for the cube platform, we started by leveraging the fundamental capabilities of AWS, using security groups on EC2 instances to firewall instances such as a control plane and the Kubernetes worker nodes, and using general ACLs restrict network subnets. Enabling BBC flow logs comprehensively across our deployment gives us security and insight into what’s going on at a packet level. But then, extending this, we needed technology to implement Kubernetes network policy. Corey Johnston: As platform admins, we specified Kubernetes network policy to block cluster between different tenants for privacy reasons, but allow traffic flow to shared services like 14s. We allow our end-users to layer on custom, dynamic, pod level policies to protect their dynamic workloads. When we surveyed the market back in 2016, there was one clear winner here who provided the technical capabilities we required and the leadership we needed, Tigera. Corey Johnston: Not only do they provide a layer for network enforcement policy we needed, but we also have the assurance. They were committed to testing it against the leading stacks in Kubernetes. For example, in our case, against Final, which we use as our SDN. Let’s take a look at our Kubernetes stack. Much like our layered approach to security, we designed our Kubernetes stack as a three layer cake, with each layer building upon the output of the last. Corey Johnston: As you can see, the names are Organa, Marsh, and Knight Rider. No shame, here. Starting at the very bottom of the stack, much like the TV show, where the flag is a foundation for the law and government and the mobile garage KITT car. Flag in our stack is our foundation layer. It contains all the base layer AWS company, such as BBC networking, security group, and other foundational quantity. Moving up to the Karr layer, again, just like the TV show, our Karr is programmed for self-preservation. Corey Johnston: This layer contains all the compute infrastructure used to run Kubernetes. The control plan consisting auto scale API servers, and etcd persistence saw, as well as tools of work nodes. We use terraform to manage our config. This allows us to make another design tenent, immutable infrastructure, or put another way to manage our infrastructure as cattle and not pets. In our previous [inaudible 00:25:57] I spoke of earlier, we used to manage individual servers, repairing them whenever there was a hardware or software fault, and tending to them as pets, where we knew the name of each one of them. Corey Johnston: The host name. Shifting to a mutable infrastructure paradigm, we now treat our servers as cattle. If there’s a problem with one of them, we simply terminate it, and replace it with a brand new instance. Terraform, when combined with our CI pipeline, enables us to achieve this and easily reconciles the running side of the cluster with our desired configuration and code. When the second layer, the Karr, is complete, we’ve got everything in place for a functional Kubernetes API server. Corey Johnston: What’s Goliath about? Our layer three. We’re moving up to Goliath. Goliath contains all of the in cluster configuration to customize our Kubernetes cluster. We use a SaaS driven pipeline to display our cluster config, consisting of RBAC policy, pod security policy, namespaces Kubernetes network policy, cluster services, and all the other KS yaml you made to customize each cluster. Corey Johnston: Again, it’s managed by our CI pipeline. No surprises, we were using Atlassian’s Bamboo to build and test this so we reduce the chance of untested code making its way into production that hasn’t already passed our pre-deployment testing. This helps us remain compliant by ensuring that we have both a peer review of our changes complete and that they’ve had a green bill prior to a production release. I’m now going to hand over to Matt, who’s going to dive deeper into our security strategy. Matthew W.: Hey everyone, I’m Matt, I’m a senior engineer on the KITT team, here Atlassian. So I’m going to talk a bit more about exactly how we secure our Kubernetes clusters with AWS and Tigera. As Corey mentioned, our Kubernetes clusters run arbitrary user-provided code. In our big bucket pipelines clusters these users are non-Atlassians, and all of these customer’s code runs on the same pool of hardware inside a given cluster. Matthew W.: This configuration provides a big security risk for us to manage. It increases the surface area that attackers can probe and increases the likelihood that at any un-patched machine, or other vulnerability, can be exploited to gain unauthorized access to our systems or data. So naturally, some pieces of Kubernetes cluster require elevated permissions to perform their job. In a rough order of privilege, from highest to lowest, the components go etcd, which is the persistent storage and source of truth for the cluster. Matthew W.: Then the API server, which talks directly to etcd and manipulates the data in there. Then, the rest of the control plan, which includes the controller manager and scheduler, and they both talk to the API server. Then, any core system services that require some privilege, like kube-proxy, or Calico, or CoreDNS. Finally, there’s the user code, which uses the compute to perform arbitrary tasks. This has the lowest priority of all the components. Matthew W.: To capture these required levels of resource access, we break our Kubernetes workloads down into two broad categories, privileged and restricted. privileged workloads include all of the system components I just mentioned, restricted workloads are all about user’s code, and then some of our own, that just doesn’t require privilege. These restricted workloads make up the majority of the code that runs in our clusters. Matthew W.: We try to separate these two kinds of workload from each other on as many different layers as possible to prevent the damage that could be done by an attacker. As Corey mentioned earlier, we designed our clusters using the principle of defense-in-depth. Ideally, we want to stop an attacker gaining access to any part of our system. Full stop. However, over a long period of time, the likelihood of a compromise still exists. Matthew W.: To mitigate the potential impact of an attacker, we have paid careful attention to network segmentation, and where our customers code actually runs inside the network. Network ACLs, to block general cases of classes and cases of traffic. EC2 security groups, to ensure separation between discreet types of hosts. And Calico rules, to dynamically enforce layer four network policy, to catch any finely grained requirements. Matthew W.: The Calico rules we use range from large and sweeping, like denying all restricted workloads from accessing the EC2 metadata servers, to small and specific, like denying particular pods of the ability to talk to the Kubernetes API server. We also pay careful attention to which EC2 instances we allow these workloads to run on. restricted workloads and not allowed to run on the same EC2 instances as the control plane or any other core system workloads. Matthew W.: This reduces the chance of an attacker being able to escalate their privileges or ex-filtrate sensitive data should they be able to break out of a container. So now, let’s examine whatever and host segmentation looks like in a bit more detail. All of the segmentation is done with the goal of keeping those privileged and restricted workloads as separated as possible. Matthew W.: We have two major network segments that make up each cluster, the privileged subnets and the unprivileged subnets. The privileged subnets run all of the control plane components for the cluster. This includes the Kubernetes API server, the etcd that stores the persistent state for the cluster, the in-cluster CoreDNS services, and our monitoring and alerting tools for the cluster. The unprivileged subnets roam our users code and services and any system demons that we run on all the boxes. Matthew W.: The biggest difference between privileged and unprivileged subnets is their view of the Atlassian corporate network. On our most secure clusters, the unprivileged subnets cannot see any services running elsewhere in Atlassian. This is accomplished by applying network ACLs to those subnets and restricting their route tables. Another important benefit of organizing our networks like this, is that we can block code running in the unprivileged subnets from being able to access anything inside the privileged subnets, and then we can white list any required exceptions. Matthew W.: This means we can allow the bare minimum interaction between un-trusted code and core system components. For example, we don’t allow anything in the unprivileged subnets to talk directly to etcd, as that would open up a high impact attack vector. The Kubernetes API is only accessible via the controller ELB, which can only receive traffic from the corporate network. This prevents any API based attacks or abuse unless the attacker is already inside our network. Matthew W.: Services may only expose themselves to cluster external networks via an HTTP or HTTPS port, which can be accessed via the ingress alb for the cluster. Inside this cluster, services can talk to each other on other ports, but restricted workloads generally cannot talk to privileged workloads. The separation is enforced by Calico rules. Console access to the nodes in the cluster is provided by a set of jump boxes that live in the DMZ. We can SSH into those jump boxes from our corporate network after completing a two factor auth challenge. Matthew W.: From there, we can SSH home to any node in the cluster for administration purposes. Connections from inside the cluster are not permitted outbound to the jump boxes at all. Despite all of the thought that’s gone into designing our network, to ensure that privileged and restricted workloads stay separated, at some point, they do need to interact. For example, a restricted user build needs to access the frugal entity and workload to find out the IP address of the service they need. Matthew W.: This is where Tigera is a big part of how we do things here at Atlassian. We use Tigera’s Calico to filter which workloads can talk to each other, based on various metadata within Kubernetes. Workloads can be segregated by their communities namespace or by arbitrary labels that we apply to them. This allows a great degree of flexibility for our platform and lets us create the simplest possible rules to give our clusters the security that we need. Matthew W.: There’s a few rules of thumb that we used when creating our network policy enforcement rule set. All privileged workloads can talk to all other privileged workloads without restriction. Restricted workloads can talk to privileged workloads, but only on a case-by-case and white listed basis. By default, restricted workloads are completely isolated from each other. They have to opt in to be able to talk to each other. Matthew W.: Privileges workloads generally run in specific umbrella namespaces, so a kube system is an example of a privileged namespace. This is important, because it allows us to categorize what workloads broadly and reduce the amount of conflict that we have to manage. Workloads that listen on host ports, should be rare, and when they exist, they should have their traffic filtered by both EC2 security groups and Calico rules. Security groups cannot stop traffic looping back to a service running on the same box like, the kubelet, for example. Matthew W.: So calico rules are used to provide defense-in-depth here. Having both allows us to be more confident in our security. Some examples of rules that we have in place can be seen in the diagram on the right. Known bad actors are not allowed to be contacted on egress from workloads. Goliath lets us keep … Our Goliath layer, that Corey mentioned, lets us keep, an update, lists of bad actors, which can be rendered into Calico rules, which are then deployed to our clusters. Matthew W.: Restricted workloads cannot talk directly to the AWS metadata service. This is very important, as access to the metadata service can allow an attacker to access potentially sensitive resources hosted in AWS, like S3 buckets. It can also allow for privilege escalation. In certain high-risk clusters, like Bitbucket pipelines, we stop restricted workloads from talking to the DNS service at all, just to remove potential attack and discovery vectors. All workloads have unrestricted access to private S3 buckets. Access control here is handled by AWS IAM end bucket policies. Matthew W.: We don’t want to stop our users accessing what they should be able to access. One big win that we’ve had the Tigera’s Calico is empowering our users to extend our global filtering rules with specific rules of our own. We order our filtering rules in such a way that, after the mandatory filtering is done, customers can add new rules via Kubernetes network policy objects, to allow or deny traffic to their service as they require. This way, we can cover any extent of security that our customers need for their workloads, without having to be manually involved. Matthew W.: Our customers get multiple control over their service as security, and they aren’t dependent on us to make changes on their behalf. So some of the benefits of Tigera on AWS. They’ve helped us stay agile. Utilizing our Goliath layer to template out filtering rules for our clusters, we can build into the deploy updates to our network policy configuration across all of our clusters in less than 20 minutes. Matthew W.: As of recently, we had more than 20 clusters to manage, so that’s seriously quick. The previously mentioned ability for teams to be able to control and customize their own levels of security is another big win for company-wide agility. If teams don’t have to contact us, then they can do their job faster. Tigera has been used beyond the KITT team as well. Our network engineering team has used Tigera’s Calico to enforce export control and filter traffic flows for services running on top of our old pass. Matthew W.: In that configuration, each service runs on an individual EC2 instance, so allowing rules to propagate across that many boxes is quite impressive. One real world and real time example of how Tigera’s Calico has helped us is in diffusing Bitcoin mining abuse on Bitbucket pipelines. One day, the pipeline’s team paged us into an incident room and alerted us to some potential Bitcoin mining happening inside their cluster. ‘Matthew W.: Minutes later, we had identified the command used by the offending bills, which contained the domain name of the Bitcoin mining pool. We immediately moved to deny the IP addresses associated with the domain name using Calico rules. After verifying that those rules blocked the expected traffic in a development cluster, we applied them to the production cluster via the Kubernetes API and halted the abuse immediately. This is a big win, because without Tigera we may have had to script a solution to the problem dynamically. Matthew W.: This would likely be more error-prone and much slower. Looking ahead on our road map, there’s two main initiatives that we are working with Tigera on. first and foremost, is the migration of our micro services on to our next gen PaaS. Powered by Kubernetes, this PaaS will enable every developer at Atlassian to build and ship faster. Kubernetes enables faster iterations, because deployments can be stood up in seconds, not the minutes it takes on our current PaaS. Matthew W.: Individual EC2 instances take much longer to stand up, than just spinning up a new docker container on an existing Kubernetes node. Tigera’s helping us address application layer policy requirements for this next gen PaaS. All of the rules I mentioned previously are done at the transport layer. We do TCP and UDP filtering, based on raw IP address ranges, or on Kubernetes metadata, that gets transformed dynamically into IP address ranges and lists by Calico. Matthew W.: As we on-board more and more different types of workload, our users requirements are increasing the need to define policy in terms of application layer constructs. One example that comes up often is the ability to restrict access to external non-cluster resources by their domain name. Currently, we’re only able to block the IP addresses associated with a domain name by enumerating them statically, but looking at the application layer will allow us to stay protected even if these IP addresses behind the domain change. Matthew W.: We’ve been working with Tigera to get these features ready for our next gen PaaS and satisfy our developers and security teams alike. The second exciting area we’re working with on with Tigera is Windows. With Windows containers gaining more and more momentum in the containment ecosystem, we’re also keen to support running Windows pods in our Kubernetes clusters. The main use case for this is that some Atlassian teams would love to be able to use the same CI/CD pipeline and tooling they already used to build Windows binaries of our apps. Matthew W.: Okay, that’s it for me. I’m going to hand over to Karthik from Tigera now. Karthik Prabhak: Wow. Wonderful, what a fantastic presentation by Matt and Corey. Thanks Matt. Thanks Corey. I don’t think there’s any way I can top that walk through that Corey and Matt did. I mean, come on, they even have the Hoff, and obviously KITT, and the team that they have there, they got managed to get it onto the slides, which is incredible.believe it or not, I tried convincing our marketing folks that we should try and get some perhaps a Baywatch themed slide deck, but that was quickly nixed. Karthik Prabhak: All of that said, I want to just wrap things up here with a few things for you to consider. As you consider some of the points that Matt and Corey made in their journey, and you think about the applicability to your own sort of migration to microservices, leveraging Kubernetes on AWS, there are sort of three fairly big and important questions that you need to ask yourself. Firstly, and most importantly, how do you secure your workloads when they’re being deployed in a very declarative and orchestrated platform, like Kubernetes running in AWS. Specifically, as you look at microservices, which are spinning up dynamically, and increasingly not just in a single cluster, but increasingly with canary deployments spanning multiple clusters. Karthik Prabhak: How do you enable your security and your policies for control, to protect and provide wrappers around these workloads, which are now dynamic. The question you need to ask yourself is, do you sacrifice security for agility, or do you compromise on agility in the name of security, right? Or, like Atlassian have chosen, is there a way that you can codify your security policies and the corresponding actions and orchestrate them in lockstep with your workloads, and ultimately tie them to your workload identity, which is now very dynamic. Is that something that you can consider? Karthik Prabhak: The second thing that you should ask yourself is, how do you know that you’re secure? Do you have the right visibility and traceability tools to help you make sense of the raw data? There is going to be a lot of raw data coming at you, when we deal with the dynamic platform like Kubernetes. In order to thwart attacks, or to respond to indicators of compromise, or dealing with bad actors, like in the example that Matt provided on the Bitcoin mining situation, do you have the right level of visibility to make sense of what’s going on, so that you can automate some of your security responses that you take? Karthik Prabhak: And thirdly, it’s not just about knowing that you’re secure, but how can you prove to others that you’re secure? Specifically, the thing to consider is how can you prove this to auditors? Especially for those of you who could need to ensure compliance, whether it is to various internal business controls or very often in ensuring compliance to regulator controls, how do you prove to auditors that you are indeed secure? To be able to do that in a continual basis in lockstep with new application roll outs. Because, as you all know, these days, application roll outs happen by the second, rather than every few weeks or months, which was the case years ago. Karthik Prabhak: Really, the consideration that you need to look at is, how do you ensure continuous compliance in lockstep with CI/CD, right? I’m not sure if CC has become a common term, but ultimately, this is about ensuring continuous compliance, even as your applications and micro services continuously are iterated. As you consider those three questions, I think it’s also important to make sure that whatever solutions you discover, that help you address those three challenges, are also consumed in as native as possible within Kubernetes and AWS. Karthik Prabhak: Ultimately, this is really about ensuring that the tools that you use, the solutions that you use, all work with the existing infrastructure, the existing processes that we rely on, and that you don’t have to go through any artificial acts to be able to integrate these students. At Tigera, this is the problem that we’ve focused on solving, together with our customers and partners such as Atlassian and AWS. To give you a little bit of context here, so as Matt and Cory pointed out, Tigera solutions, such as Calico and Tigera Secure, which builds on top of Calico, tied into Kubernetes and service mesh technologies, is very deeply integrated with the base orchestrator. Karthik Prabhak: We leveraged, not just the superficial information, such as IP addresses and port numbers, but a lot more detailed metadata that provides the context for the application within the orchestrator. We leveraged this detailed metadata in helping operators make security decisions, and to help them in orchestrating actions, based on policies that are defined. Also, in terms of how we report back and provide visibility, not just at the wrong level, but the intelligence that is specific to the orchestrator. Part of that is also making sure that when you have workloads running in Kubernetes across one or more clusters, that you can also enable fine-grained access to not just other workloads running in the same cluster, or in Kubernetes, but also to workloads running outside of Kubernetes. Karthik Prabhak: As an example, Tigera Secure, We integrate very closely, leveraging not just concepts like network policy in Kubernetes, but also in terms of how we integrate that very seamlessly into the rest of the infrastructure. For example, being able to ensure fine-grained policy control, to say that certain pods, or certain tenants pods are allowed access to that particular clients other services, such as perhaps an AWS RDS, or elastic cash, or other services, or perhaps there are the external EC2 instances that these pods need to access. But to ensure that pods running alongside them on the same node belonging to a different application, or belonging to a different tenant, are not given that same access and that these tenants and applications are isolated from each other and done so dynamically. Karthik Prabhak: This is where we’ve provided some very deep integration, working within the Amazon partnership, to ensure that this works seamlessly with integrations like security drops. Ultimately, the visibility for all of this actions, the policy actions, the metadata, the raw information, can be consumed via native services, such as AWS Cloudwatch. With additional detailed metadata, and also detailed interpretation of that metadata, provided in a variety Of ways, including, obviously, with Cloudwatch and Cloudtrail. Karthik Prabhak: Now, this is all fine and good when you want to consume the raw data. But taking it one level beyond that, Tigera has also been focused on the notion of continuous compliance. Ultimately, keep in mind that your workloads are being iterated and changed dynamically and continuously. If you’re doing an audit check once a week, or once a month, which has been the case historically, keep in mind your audit check is no longer accurate. Karthik Prabhak: This is where Tigera has been focused on the notion of continuous compliance. Where we keep a complete state of all policies and infrastructure at any given time. Together with that, we correlate it with a complete view of all flow logs, in terms of what are allowed and denied floors, including floors that were denied at the workload itself. Together with that raw metadata, we correlate it with additional metadata from the base platform, to be able to allow you to generate dynamic compliance reports, where you can pick any instant in time, and say, at that, given instant, What was the state of compliance based on your definition of compliance. Karthik Prabhak: Whether it is for internal business control, or for regulatory reasons. To be able to say that, hey, I that gave an instant what workloads were compliant by our definition of compliance? What workloads were trying to violate policy? What were the policy violation attempts and what does that mean? Were there any security anomalies that indicate is a compromise? And to give you additional intelligence based on that, based on correlating it with additional metadata and deeper statistical analysis that we do. Karthik Prabhak: Really, to wrap things up, I want to leave you with a few thoughts, right? As you consider migrating to a new environment that provides the agile application delivery and deployment platform, with Kubernetes on AWS, keep in mind that your workloads now a moving to very declarative and dynamic pattern. In this sort of a dynamic micro-services environment, it is incredibly important to now leverage a strong identity and combine that with a defense-in-depth posture, leveraging concepts like network policy and application policy, that are now all codified and enable you to allow your workloads to move around between clusters. Karthik Prabhak: Enabling, for example, scenario deployments Of blue grain deployments, but to make sure that your security policies can follow your applications as they get deployed within different clusters. Secondly, to make sure that you have detailed traceability and visibility, via native AWS constructs, for decode raw visibility of things, like flow logs and policy metrics. Together with that, being able to make sense of that raw data, so that your policies, both at the network level and the application level, can be enforced at multiple points of enforcement. Really, providing a defense-in-depth posture. Karthik Prabhak: Obviously, as you get to applications and infrastructure, which has been defined as code, you really want your security policies to also be codified. But in addition to just the policies being codified themselves, I think most organizations today, if you’re relying on manual human intervention to respond to all security threats, you are going to be heavily, heavily overloaded and overworked. So really, you want to focus in on how can you automate as much of your security actions as well, together with your policy controls. Karthik Prabhak: From that perspective, ultimately, various elements of the security operations, and elements of what going to your sock? I think things like incident response, and the workflows you take with that, can be automated in lockstep with networking application policies. These are all constructs that we’ve really been focused on and helping our customers solve. There’s two main solutions that we focus on at Tigera., building on top of Calico, but focused on the more advanced network security work flows, from operations and organizational perspective. Karthik Prabhak: There’s a Tigera Secure Enterprise Edition and there’s a Tigera Secure Cloud Edition. Both are fully certified and work very closely with, both self-managed Kubernetes clusters on AWS., as well as the hosted Kubernetes clusters on AWS EKS. As as a quick reward for you attending this webinar, we have a promo for Tigera Secure Cloud Edition, and please do try this out. It is fairly easy to get started, but do reach out to us as well with any of the links mentioned. Karthik Prabhak: We’d love to collaborate with you and look forward to getting your feedback and collaborating with you moving forward. Thank you so much for your time. With that, I’ll hand it back to our hosts for the afternoon. Carmen Puccio: Yeah, awesome. Thank you Karthik, Matt and Corey. What we’re going to do now is we’re going to hop into our live Q & A. as a reminder, the attendees on the webinar, you’ll be able to submit any written questions through the questions panel. In the event that we aren’t able to answer your questions today, we’ll follow up with everyone individually via email. Carmen Puccio: There were a couple of questions around will the slides be available? Will this video be available? I addressed that earlier, but I’ll just reinstate it just real, quick? Yes, the slides will be available in about two to three days and the video will be out as well. So you can definitely check it out again and send it to your teams. Carmen Puccio: With that said, I kind of want to jump into this really quick with the limited time. Karthik, one of the questions that’s popped up is essentially around like, you know, a customer has teams with various different needs at the network security level. How do you address the needs of different teams when your security policies affect their workflows? Karthik Prabhak: Oh, that’s a great question. That sounds very much in line with one of the key areas that we have been focused on, which is that … Look, when you look at platforms like Kubernetes, and how it allows developers and individual application owners to define their Own policies, that’s great. But when you look at your real-world deployments, often organizations have a need to say that, that’s great that applications can define their own fine-grained policies, but we’ll need, as a security operator, as a network operations team, as a cloud platform team, we’re going to need additional controls, that we need to bound these individual application developer policies. Karthik Prabhak: From that perspective, Tigera Secure has built out a very nicely integrated set of controls, that we call hierarchical policy tiers, which allow an operator to supersede the developer policies with other policies that take higher precedence. Allowing an operator to isolate individual applications and tenants from each other, but also to be able to insert and inject rules, whether it is to thwart an attack at a finer grained level, or whether it is for monitoring and visibility. These are policies that are then tied in to organizational roles by our back. Karthik Prabhak: So this gives a great facility to codify, not just your individual policy rules, but to also codify some of your security and visibility practices, that you need to look at how you operate the cluster introduction. I encourage folks to come take a look at Tigera Secure. This is actually a feature that we have as a differentiating feature in the product and something that we’d be happy to engage with you on further. Carmen Puccio: Cool. Another one for you, as well. Excuse me. for customers that are adopting service mesh on Kubernetes, and trying to enable, and trying to decide where they need to enable a security policy, does Tigera have any best practice or recommendations around this? How can you guys help? Karthik Prabhak: I think that’s a great question to key up for a future webinar as well. Tigera has been very involved in service mesh technologies from the early days. As an example, we’ve been very involved in service mesh technologies, which are increasingly gaining mind share, such as STO, and also others. From Tigeras perspective, we have continuously innovated with Calico, to unify policies, not just at the Kubernetes and network policy layer, but to also include application policies leveraging work that is happening in service mesh technologies, leveraging sidecar proxies, like the Envoy proxy. Karthik Prabhak: But together with that, we’ve also been very focused on the concept of workload identity. Specifically, in how you can leverage a cryptographic identity that is being assigned to a workload, in policy enforcement. The concept being that, as your workloads move around between clusters and as you get to canary deployments, how your Policies really follow your workloads. This is an area where Tigera has been working very closely within the upstream Kubernetes and service mesh communities in enabling this dynamic concept of security policies with workload identity. So again, a great topic that we can dive into a deeper or perhaps follow up with us when we can give you more color. Carmen Puccio: Okay, cool. This one, actually, is also for you, and there was another question in there, so I’m going to kind of sum them up, right, but it’s around PCI compliance and visibility. If a customer needs to see the network flow, and what’s essentially denied at the pod level, are there ways to address this? Then, at the same time, too, one of the questions is, do you have any pre-configured rules that help them achieve PCI compliance? Like, how does that work? Karthik Prabhak: Mm-hmm (affirmative). No, great questions. Yeah, PCI is one common compliance framework that we get a lot of interest from our customer base. There’s a number of other compliance frameworks. Each tend to have a slightly different requirements, in terms of network compliance reporting, as well as the controls that would need to be put in place. Karthik Prabhak: In each of these cases, ultimately, what is a fundamental requirement, is to be able to provide detailed visibility. To not just provide that visibility, but to be able to generate that proof, dynamically, saying that at any given instant, explain and show in an audit report. That can be generated dynamically, what be compliance posture is. Specifically, what workloads which were part of that, that need to meet PCI compliance. What were the state of the workloads? Karthik Prabhak: What were the state had the flows happening with these workloads to other workloads within the infrastructure? And to be able to report on that dynamically. Ultimately, at Tigera, Tigera Secure, we capture that detailed metadata. We capture the detailed visibility. We also correlate that with other pieces of metadata, but ultimately, what we enable operators to do is to be able to generate dynamic compliance reports that operators of infrastructure and applications on Kubernetes in AWS can consume via Cloudwatch, Cloudtrail, together with templates, to report on things like PCI compliance. Karthik Prabhak: Obviously, these can be customized, based on the specific circumstances of the cluster, but we provide a number of common templates to help operators and application owners get started. Great question. Carmen Puccio: Awesome. Awesome, awesome. Thank you. On that note, we’re going to wrap up today’s webinar. Just one final reminder, you will receive an email, within the next two or three days, with a link to these slides on SlideShare and a recording of today’s webinar. We want to thank you very, very much for attending. Thank you for taking the time today. If you have any other questions, please don’t hesitate to reach out. With that said, thank you to Matt and Corey from Atlassian. Thank you from Karthik from Tigera. And thank you to you for attending. Thanks all. Bye.