Rethinking observability for Kubernetes

Observability is a staple of high-performing software and DevOps teams. Research shows that a comprehensive observability solution, along with a number of other technical practices, positively contributes to continuous delivery and service uptime.

Observability is sometimes confused with monitoring, but there is a clear difference between the two; it’s important to understand the distinction. Observability refers to a technical solution that enables teams to actively debug a system. It is based on exploring activities, properties, and patterns that are not defined in advance. Monitoring, in contrast, is a technical solution that enables teams to watch and understand the state of their systems and is based on gathering pre-defined sets of metrics or logs.

What makes Kubernetes observability different?

Conventional observability and monitoring tools were designed for monolithic systems, observing the health and behavior of a single application instance. Complex distributed microservices architectures, like Kubernetes, are constantly changing, with hundreds and even thousands of pods being created and destroyed within minutes. Because this environment is so dynamic, pre-defined metrics and logs aren’t effective for troubleshooting issues. Conventional observability approaches, which work well in traditional, monolithic environments, are inadequate for Kubernetes. So an observability solution that is purpose-built for a distributed microservices architecture is needed to match the unpredictable nature of Kubernetes cluster behavior, and to capture the data required for teams to identify and troubleshoot issues in real time.

Kubernetes observability challenges

Kubernetes provides abstraction and simplicity with a declarative model to program complex deployments. However, when it comes to debugging microservices, this abstraction and simplicity actually creates complexity. There are several reasons why.

  • Kubernetes’s microservices architecture itself is complex, involving tens to hundreds of microservices that are communicating. Debugging this architecture requires specialized tools.
  • Kubernetes clusters run on a distributed infrastructure that is spread across on-premises, hybrid, and cloud environments.
  • Kubernetes infrastructure is dynamic. The platform spins up required resources and provides ephemeral infrastructure to scale the application based on demand.
  • Kubernetes deployments need fine-grained security and an observability model that complements a defense-in-depth approach. Some DNS issues, for example, can indicate a compromised resource in the cluster.

Given the complex nature of Kubernetes microservices deployments and the overwhelming amount of data generated, it’s not humanly possible to diagnose and troubleshoot this kind of environment. This becomes even more problematic when mission-critical applications are involved. Given the density of applications and the dynamic nature of the computing environment, the problem is worsening with each day.

Enabling Kubernetes observability with artificial intelligence

Existing tools are inadequate: It’s time to re-imagine the solution for this critical observability problem. We can start by applying machine learning and artificial intelligence (AI) to observability; in effect, deploying machines to de-bug machines. By automating dynamic monitoring processes, for example, we can create intelligent observability that converts telemetry data into actionable insights. We can use AI to analyze this data to identify problem patterns, and create unique observability “snapshots” that can be used to build reference templates, which can be catalogued and accessed by troubleshooting teams when issues arise.

Kubernetes introduces dynamism to distributed infrastructure, and a high level of complexity for troubleshooting teams. By applying machine learning and AI to observability, we can open exciting new avenues to make Kubernetes more consistently reliable and secure, speed time to root cause and resolution, and make it easier for new SREs, DevOps engineers, and service owners to effectively debug dynamic distributed Kubernetes environments.


To learn more about new cloud-native approaches for establishing security and observability for containers and Kubernetes, check out this O’Reilly eBook, authored by Tigera.


This article originally posted to Container Journal. Reposted with permission.

Join our mailing list

Get updates on blog posts, workshops, certification programs, new releases, and more!