Adventures in Partitioning
Or, How to map my current operational model into the brave new Kubernetes world
Recently, I was working with a large customer of ours as part of an engagement to help them work through their security posture for their Kubernetes environment(s). As with most large customers, they are most certainly not greenfield. They have decades of existing infrastructure and, more importantly, process, that simply can’t (and shouldn’t be) ignored.
Very often, when I come into these situations, it would be polite to call such an existing environment “organically evolved,” when, in reality, it’s a bit of a dog’s breakfast, and the customer is madly trying to get out from under as much of it as possible. That, however, was not the case in this situation. While the customer does want to simplify where possible, most of the process and relevant infrastructure is actually fairly well thought out in this case. Is it what you would deploy today? No, but it certainly is in the same ballpark, and philosophically a reasonable approach. So, unlike usual engagements when we try and tease the gems from the chaff, in this case, the conversation turned more to adaptation.
The key goal they want to accomplish is to segregate Kubernetes clusters into production ‘zones’ (i.e. production, production test, dev, and QA). Initially, they also have a requirement to isolate the production clusters by level of risk, but that is something that may be relaxed over time and experience.
As another dimension, they want to separate applications and application teams by using Kubernetes namespaces, but they do need communication between those applications.
In support of their existing model, they have a fairly comprehensive database that maps users to both their ‘stage’ (i.e. prod, dev, qa, etc.) and role in that stage (developer, administrator, etc.) and the applications/resources they are allowed to work on.
Let’s look at what that database (in an abstract way) looks like.
- Each application and resource is given an ID and has some metadata attached to that.
- Who own’s the application/resource
- Other metadata about the application
- Each “user” (human or otherwise) has an ID and there is metadata attached to that.
- What application IDs they are allowed to interact with
- What role they have (developer, production operations, admin, testing engineer, etc.)
The nice thing this opens up is that there is a level of indirection between the user and their role and the user and their applications, as well as a decoupling of the applications that user can access or modify, and the production pipeline stage they are allowed to operate in.
In many systems I’ve seen, these two facts (stage and application) are coupled together. As an example, if you had three applications, and 3 stages, you would actually have 9 “labels” for a given user. For example, if we had dev, test, and prod stages, and CustomerRecords, Payments, and Inventory as applications, the close coupling would require the following “groups” to drive who can do what:
However, in a decoupled environment, you have fewer “groups”, with a “user” belonging to potentially more than one group:
This is starting to look like a concept that we talk quite a lot about here at Tigera — micropolicies. For an introduction to the concept, you can watch a great webinar that we did this year on the topic: 5 Tips for Organizing Your Kubernetes Network Security Policies. The idea, however, is that you should not write complex policies that control more than one concept at a time. To make this simplified approach work, however, you really need hierarchy in policy enforcement. This is now REALLY looking like things that we talk about here at Tigera like the “Policy Tiers” in our commercial product, Tigera Security Enterprise Edition.
In this case, the customer wants the stage (dev, qa, prod-test, and prod) to be the higher priority isolation and the application the next lower “tier.”
So, how should we implement this? The answer is using Kubernetes API RBAC rules. By referring to the stage metadata in a user’s record, we can write a simple set of RBAC rules that only allow “dev” users to access any cluster that is operating at the dev stage. Wash, rinse, and repeat for the rest of the stages. This will be done at the “root” of the API server for a given cluster (the user will just not be able to authenticate to a specific cluster at all, unless they and the cluster are in the right “stage”).
Next, each application namespace will have RBAC rules that will only allow users in the same application group as the namespace to make namespace API calls.
Combined, no user should be able to access any cluster/namespace tuple unless they are in the right stage, and the correct application group.
So, why do this, what’s the benefit? Let’s take a user that is an “Owner” of say the Payments application, or the “Admin” of all Production environments. In that case, you could have an RBAC policy that allows anyone tagged “Owner” access to any cluster, but they would still be limited by the RBAC policy on the application namespace to only have access to their application (Payments). Similarly, someone who is labeled an Admin and Prod would have access (via the Prod label) to all Prod clusters, and Admin would allow them access to all applications via the namespace RBAC rules. They would not, however, have access to any other stage’s clusters.
So, we now have constrained the customer’s userbase to the desired subset of capabilities, all with very simple, loosely coupled policies, enforced at multiple levels. This is starting to sound like the Tigera story, just with RBAC, isn’t it?
Now, how does Tigera fit into this? With our tiered policy, those same RBAC rules would allow only users with the right credentials to edit a given policy tier the ability to write policies that would be intra (or even inter) namespace. Only people in the Inventory application would be able to write inventory namespace policies, for example. If a user isn’t in the correct tier, they won’t even be able to communicate with the cluster’s API server, so that effectively blocks them.
Now, what do we do to allow the Customer Record application microservices to talk to the Payments microservices? In this case, we can write GlobalNetworkPolicies, a unique feature of Calico and Tigera’s solution that allows a network policy to be written that applies across multiple Kubernetes namespaces. So, in the case above, a security admin in the Payments group could write a policy that would allow certain pods (identified by a label) in the Customer Record namespace to connect to certain pods (again identified by label) in the Payments namespace. A matching outbound policy would be written by the security engineer in the Customer Records application team.
Now, by using RBAC rules, Tigera tiered policy, and multiple levels of RBAC control in the Kubernetes API, we can leverage the customer’s existing authentication and authorization resources to build a secure multi-tenant, multi-stage Kubernetes environment.