In the previous post, I introduced the concept that a cloud-native or microservice application environment is somewhat different than the environments that preceded it in some fairly fundamental ways.
In this post, I’m going to explore one of these platforms to give you a bit of grounding into what these platforms do and at a very high level, how they do it. There are a number platforms in the container management space such as Docker’s Swarm and Mesosphere’s DC/OS, that do many of the same things, with the same concepts, but with significant differences between them. Kubernetes, however, seems to have the greatest adoption, and therefore, that is the one I will use as an example. Kubernetes is an open source project that originated at Google and is based in part on Google’s own internal infrastructure that it has been using for over a decade, called The Borg.
A full discussion about Kubernetes is beyond the scope of this blog post, but you can read a great write-up about the concepts involved for more details. That said, here are some of the high points:
Modern applications are assembled from modular bits of code that are religiously curated from code repositories, put through an automated test and deployment CI/CD model, and assembled just-in-time with ephemeral run-time components and connections. This JIT assembly, along with agile development patterns, is enabling even legacy businesses to become software enabled, or even software driven.
While this approach has driven some radical changes in software development, it has also introduced an impedance mismatch between today’s software development process and the infrastructure that hosts the artifacts, or output, of that process.
The end goal of increased release speed required for a software-enabled business cannot be met if an application stack can be updated and scaled multiple times a day, but it takes days, weeks, or even months, to make the corresponding changes to the supporting infrastructure. All you have achieved is to kick the can down the road. The end result is unchanged.
This is not a new problem: platforms such as Kubernetes address “everything as code” by defining the infrastructure and its relationship with the software and applications that it supports via very well defined, modular interfaces that can be manipulated as code. If you want to:
These code fragments are simple declarations of your intent of how the infrastructure should interact with your code and services. Most importantly, they are life-cycled just like the rest of your code and probably alongside your code. They are checked into a source code repository like git, are deployed and validated through the same CI/CD pipeline, etc.
Since the code defining the microservice or application, and the code defining the interaction between the code/microservice and the infrastructure are developed, managed, and life-cycled concurrently, the evolution of the infrastructure tracks the evolution of your code base.
Similarly, the same mechanisms you use to control the deployment of your code such as peer-reviews, automated testing, and others, can also be applied to your infrastructure. This can be vastly superior to manual intervention or e-mailed trouble ticket requests.
This concept is widely referred to as “everything as code” and underpins both the microservice revolution, and the infrastructure that supports that revolution. Thankfully – Kubernetes is based on “everything as code.”
This is one of the key differences between the previous generations of platforms and the cloud-native model. In previous platforms, each instance of an application or server was a specific “pet.” VMs were managed no differently than physical servers. New versions of software would be loaded onto the VM and updated. If security vulnerabilities were discovered, those VMs would be patched, very often manually, one by one. Operational issues would be resolved by operations teams logging into the affected servers or VMs to make the necessary adjustments.
The problem with this model is that each server (physical or virtual) becomes its own entity. There is no commonality. This approach, by its very nature, limits scale. There is no rational way in this model, to deal with potentially tens or hundreds of thousands of containers, each slightly unique.
If any change needs to be made to either the infrastructure or the code that runs on that infrastructure, it is updated and versioned via the CI/CD mechanism discussed above. The resultant new, updated versions of the infrastructure and/or code is deployed, and older versions drained and shut down. Therefore, Kubernetes drives an assumption of immutability.
This means that there is no need for anyone to “log in” to infrastructure components or containers / microservices. There is no reason to change running platforms. Any changes are then easily traceable and replicable via the CI/CD infrastructure.
In Kubernetes, as in most cloud-native systems, the definition of the services and the infrastructure is declarative in nature. Instead of configuring the system based on what is currently running in the system, such as firewall or load balancing configurations based on specific IP addresses of current workloads, those instructions are based on declarations of what the system should do in a given situation. The declarations instead define the behavior of the system, and it is up to the system to evaluate the existing environment, and compare it to the desired state, as defined by the declarative statements that are injected into Kubernetes, usually by the CI/CD environment.
This allows for the system to be configured a priori, and then, as applications or services are deployed or modified, those declarations will be adhered to, and the system will self-adjust and continue to adhere to those declarations.
Earlier versions of orchestration systems were very centralized, with a single, potentially logical, controller making all the global decisions about all application lifecycle actions. This worked at a small to moderate scale. However, at large scale, and at high rates of change, this will not scale.
Cloud-native systems, on the other hand, distribute those control activities to the lowest level possible. In most cases, the only centralized decisions that are made at a controller level are system-wide decisions such as cluster-wide service proxying, and the selection of what nodes should host what workloads. The actual actions that instantiate the workloads, monitor them, and life-cycle them is made locally, on the node hosting the workload in question. This allows the system to scale as the size of the cluster increases and keeps decisions that are inherently ephemeral, such as the life-cycle of a specific workload, local to the node hosting that workload.
As I have discussed earlier, there are a huge number of moving parts in an at-scale microservice environment. There is just no way to uniquely refer to each and every one of those elements. Even things like IP addresses, which, in the past were fairly static, are much more ephemeral. One minute a given IP address may refer to a web front-end, the next minute that web front-end might be life-cycled out, and that IP address would then refer to a database or some other component. So, how do you “herd the kittens”?
Systems like Kubernetes do that by referring to metadata that is part of the code that defines the given service, infrastructure component, etc. In short, you attach labels or other metadata to components or applications and then write the declarative statements I mentioned earlier by referring to those same labels. A label can be applied to multiple components, and multiple labels can be attached to a single component.
This is a very flexible, automatable, and scalable mechanism to refer to all the components in a cluster.
Let’s say that you have a web service that you want to expose to the rest of the world. That service will be handled by some number of actual server nodes that will need to scale with load. You will need to adjust firewall rules to allow traffic to the server nodes that are offering the service, and need to be dynamic due to scaling. You will also want to do blue/green or canary deployment testing before new versions go fully production. How would you do that in the legacy world?
Most of these are manual steps, that are slow, also error-prone and do not scale with thousands of microservices and containers.
The same requirements as above.
The key thing to realize here is that almost all of this is automated. What is changed is the declarative statements the identify what something (a microservice, a service definition, etc) is, via labels and rules that define how those labels are composed into a specific outcome (i.e. exposing a service).
The humans are freed to define “how the system is supposed to behave” and the system takes care of the repetitive tasks that are so often the source of misconfiguration errors and latency in meeting requirements. The system now reacts in real-time, rather than “trouble-ticket time.”
In the next installment, I will cover how these differences affect the infosec solutions that I currently use to protect our environments, for good or ill. In the installment after that, I will cover some of the strategies to accommodate those changes.
Stay tuned – the ride has only just begun.
Get updates on blog posts, new releases and more!