Kubernetes Monitoring

Kubernetes Monitoring: 7 Tools and 4 Best Practices You Must Know About


What is Kubernetes Monitoring?

Kubernetes monitoring helps you identify issues and proactively manage Kubernetes clusters. Effective monitoring for Kubernetes clusters makes it easier to manage your containerized infrastructure, by tracking uptime, utilization of cluster resources (such as memory, CPU, and storage), and interaction between cluster components.

In Kubernetes, cluster operators monitor the cluster and alert you if the required number of pods is running, if resource utilization is nearing a critical limit, or if there is a failure or configuration error that prevents a pod or node from joining the cluster. Beyond this built-in monitoring functionality, many organizations use specialized cloud-native monitoring tools to gain full visibility over cluster activity.

This is part of an extensive series of guides about observability.

In this article, you will learn:

What Kubernetes Metrics Should You Measure?

There are two main levels of monitoring in Kubernetes:

  • Cluster monitoring – Keeps track of the health of an entire Kubernetes cluster. Helps you verify if nodes are functioning properly and at the right capacity, how many applications run on a node, and how the cluster as a whole utilizes resources.
  • Pod monitoring – Keeps track of issues affecting individual pods, such as resource utilization of the pod, application metrics, and metrics related to replication or autoscaling of the pod.

The following table summarizes important metrics for cluster and pod monitoring.

Top 7 Kubernetes Monitoring Tools

Kubernetes is a complex environment, and containerized applications can be distributed across multiple environments. Monitoring solutions must be able to aggregate metrics from across the distributed environment, and deal with the ephemeral nature of containerized resources. The following are popular monitoring tools designed for a containerized environment.

1. Prometheus


Image Source: Prometheus

A popular monitoring tool that was developed by SoundCloud before being donated to the Cloud Native Computing Foundation (CNCF), Prometheus provides alerts with detailed metrics and analysis for Kubernetes and Docker. It is designed for monitoring container-based microservices and applications running at scale. Prometheus is often used in combination with Grafana to enable data visualization.

2. Grafana


Image Source: Grafana

This open-source platform for visualization of metrics and analytics provides four built-in dashboards for Kubernetes—Cluster, Node, Pod/Container and Deployment. Kubernetes administrators can create data-rich dashboards in Grafana using the information sourced from Prometheus.

3. Jaeger


Image Source: Jaeger

This open-source tracing system, developed by Uber, is used to monitor and troubleshoot distributed transactions. Jaeger addresses software issues related to distributed context propagation and latency optimization.

4. Kubernetes Dashboard

Image Source:

Kubernetes Dashboard is a web-based user interface for Kubernetes. You can use it to:

  • Deploy containerized applications to a Kubernetes cluster
  • Troubleshoot containerized applications
  • Manage cluster resources
  • Get an overview of the applications running on a cluster
  • Create and modify individual Kubernetes resources
  • Monitor the health of Kubernetes resources and discover errors

5. Kiali


Source: Kiali

Kiali provides a management UI for service mesh architectures based on Istio. It provides dashboards for visualization, and allows you to operate the mesh with powerful capabilities for configuration and validation. The structure of the service mesh is revealed via inferred traffic topology. Kiali offers detailed metrics and visualization of the health of your mesh, enables access to Grafana and integrates with Jaeger for distributed tracing.

6. Kubewatch

Kubewatch is an open-source Kubernetes watcher written in Go and developed by Bitnami Labs. It complements the monitoring solution by providing an easy-to-use interface between the Kubernetes cluster and collaboration tools.

You can monitor changes to specified Kubernetes resources and report them directly to Slack, or other collaboration platforms like HipChat, Mattermost and Flock. You can also use IT service management (ITSM) tools like ServiceNow to trigger generic webhooks for custom integrations.

7. EFK Stack

The EFK Stack integrates three tools—Elasticsearch, Fluentd, and Kibana—to collect, store, and visualize metric data. Elasticsearch is a search engine that ingests and stores data in a central repository, while Fluentd collects data from the logs of Kubernetes pods and routes it to Elasticsearch. Kibana is a plugin for Elasticsearch that functions as the UI for the EFK Stack, enabling the visualization of the logs and metrics in the form of custom dashboards.


Learn more in our detailed guide to Kubernetes monitoring tools

4 Kubernetes Monitoring Best Practices

Here are several best practices that can help you effectively monitor and troubleshoot Kubernetes environments.

1. Automatically Detect Application Issues by Tracking the API Gateway for Microservices

Granular resource metrics (memory, CPU, load, etc.) are important for identifying issues with Kubernetes microservices, but these metrics can be convoluted and difficult to use. The best KPIs to help you easily identify microservice issues are API metrics, such as request rate, call error, and latency. These metrics will quickly locate degradations in a component within the microservice.

You can easily discover service-level metrics with automatic detection of REST API request anomalies, for instance over an ingress controller such as Istio or Nginx. These metrics measure every Kubernetes service in the same way, providing consistent visibility across the clusters.

2. Always Alert on High Disk Utilization

High disk utilization is the most common problem on any system. There is no magic solution, nor can you automatically recover volumes that are statically attached to StatefulSet resources. Typically, you set the alert to 75% to 80% utilization. High disk utilization alerts are always important and usually indicate a problem with your application. All disk volumes must be monitored, including the root file system. Early detection of pattern changes can reduce issues later on.

3. Monitor End-User Experience when Running Kubernetes

End-user experience management is not built into the Kubernetes platform. However, an application’s primary objective is to provide a positive experience to the end-user, and this should be built into your monitoring strategy for Kubernetes.

To understand how your application is performing, you need to collect data via both synthetic and real-user monitoring. This will allow you to see how the end-user interacts with Kubernetes workloads, how the app responds, and how user-friendly it is. It will also inform you if you need to adjust anything to improve the usability and frontend.

4. Prepare Monitoring for a Cloud Environment

If Kubernetes is running in the cloud, certain factors need to be considered when planning your monitoring strategy. In the cloud, you also need to monitor the following:

  • IAM events – Including changing permissions, successful and failed logins. This is a security best practice for a cloud-based installation or environment.
  • Cloud API – Cloud providers have their own APIs, which are used to request resources from your Kubernetes installation and should be monitored.
  • Cost – Cloud costs can grow rapidly. Cost monitoring can help you budget for cloud-based Kubernetes services and avoid overspending.
  • Network performance – In a cloud-based installation, the network can be the biggest obstacle to application performance. Regularly monitor your cloud network to prevent downtime and user experience issues.

Kubernetes Monitoring and Observability with Calico

Because Kubernetes workloads are highly dynamic, ephemeral, and are deployed on a distributed and agile infrastructure, Kubernetes poses a unique set of monitoring and observability challenges. As such, Kubernetes-native monitoring and observability is required to monitor and troubleshoot communication issues between microservices in the Kubernetes cluster.

More specifically, context about microservices, pods, and namespaces is needed so that multiple teams can collaborate effectively to identify and resolve issues. Calico Cloud and Calico Enterprise help rapidly pinpoint and resolve performance, connectivity, and security policy issues between microservices running on Kubernetes clusters across the entire stack.

Calico Cloud and Calico Enterprise are currently the only Kubernetes monitoring tools that offer the following unique features for Kubernetes observability:

  1. Dynamic Service Graph – A point-to-point, topographical representation of traffic flow and policy that shows how workloads within the cluster are communicating, and across which namespaces. Also includes advanced capabilities to filter resources, save views, and troubleshoot service issues.
  2. DNS Dashboard – Helps accelerate DNS-related troubleshooting and problem resolution in Kubernetes environments by providing an interactive UI with exclusive DNS metrics.
  3. L7 Dashboard – Provides a high-level view of HTTP communication across the cluster, with summaries of top URLs, request duration, response codes, and volumetric data for each service.
  4. Dynamic Packet Capture – Captures packets from a specific pod or collection of pods with specified packet sizes and duration, in order to troubleshoot performance hotspots and connectivity issues faster.
  5. Application-level Observability – Provides a centralized, all-encompassing view of service-to-service traffic in the Kubernetes cluster to detect anomalous behavior like attempts to access applications or restricted URLs, and scans for particular URLs.
  6. Unified Controls A single, unified management plane provides a centralized point-of-control for unified security and observability on multiple clouds, clusters, and distros. Users can monitor and observe across environments with a single pane of glass.

Learn more about Calico for Kubernetes monitoring and observability

See Our Additional Guides on Key Observability Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of observability.

Zero Trust



Authored by Lumigo

Join our mailing list​

Get updates on blog posts, workshops, certification programs, new releases, and more!