Prometheus offers an open-source monitoring and alerting toolkit designed especially for microservices and containers. Prometheus monitoring lets you run flexible queries and configure real-time notifications. You can use it to gain visibility into your containerized workloads, APIs, and other distributed services and applications. Additionally, Prometheus assists with cloud-native security, by detecting irregular traffic or activity that could potentially escalate into an attack.
Prometheus uses a pull based system that sends HTTP requests. Each request is called a scrape
, and is created according to the config instructions defined in your deployment file. Each response to a scrape
is parsed and stored in a repository along with the relevant metrics.
This repository is, essentially, a custom database deployed on a server, that can handle huge amounts of data. One Prometheus server can simultaneously monitor thousands of machines.
There are two ways in which Prometheus can access data—either directly from the client libraries of your applications or indirectly through exporters.
An exporter is a software located adjacent to the application. You can use exporters to access data you do not have control over, such as kernel metrics.
Exporters are designed to do the following:
Prometheus provides four main types of metrics:
Learn more in our detailed guide to Prometheus metrics
Once you set up service discovery, all of your applications can provide data to Prometheus. However, you need to tell Prometheus where to look for this data. Prometheus uses service discovery to discover targets to scrape.
Kubernetes clusters are equipped with labels, annotations, and a mechanism for tracking status and changes for different elements. To discover targets, Prometheus needs to use the Kubernetes API. You can expose any Kubernetes entity to Prometheus, including nodes, services, endpoints, ingress, and pods.
Prometheus can retrieve machine-level metrics separately from application information. Node exporters can help expose memory, disk space, CPU utilization, and bandwidth metrics. You can also expose metrics about control groups (cgroups).
After the system completes the collection of data, you can use the PromQL query language to access the data. You can also use PromQL to export data to graphical interfaces, such as Grafana, or send alerts using Alertmanager, an alert handling component built into Prometheus. Alertmanager groups alerts and routes them to “receivers”, which may be email, PagerDuty, Opsgenie, or similar tools.
Here are some of the main advantages and disadvantages of Prometheus for Kubernetes.
Pros of Prometheus:
Cons of Prometheus:
You can configure monitoring processes on Prometheus by using YAML files, which specify permissions, configuration, and services.
Prometheus uses YAML files when accessing resources. It also helps in scraping Kubernetes cluster elements when retrieving information. Learn more about Prometheus YAML configuration in its documentation.
After the configuration is complete, you can install Prometheus as a container on a Kubernetes cluster. You can use various orchestration options to deploy these Docker containers, such as StatefulSets, Kubernetes operators, and Helm charts.
To deploy a Prometheus server in a container, you can go to the Prometheus user interface and use the following command:
Alternatively, you can change the Docker container to a Kubernetes deployment object, which can mount the config from a ConfigMap. Here’s how:
Here are several best practices that can help you effectively implement Prometheus in Kubernetes.
In general, data is important, but not all of it is necessary for every scenario. Keep this in mind as you design your consoles and dashboards.
Instead of attempting to display all data within a single operational console, you should strive to display the most relevant information. You can do this by thinking about the most likely failure modes, and then represent each one in a meaningful visual display.
Labels can help you refine and customize the data for your metrics. Each label set requires resources, such as RAM, disk space, bandwidth, and CPU. This data is important, but when you create labels on a large scale they consume a large amount of resources.
You can reduce costs by limiting labels on metrics to ten or less. Additionally, you should use labels only for metrics that require labels—not all do. If you do need to assign a large amount of labels to metrics, consider using dedicated analysis tools to help make the process efficient.
When tracking event timing, consider using timestamps that indicate when each event occurred rather than the time that passed since the event occured. This can help eliminate the need for updating logic and can reduce errors.
Not all components can be scraped. To monitor these components, you can use the Prometheus Pushgateway, which enables you to push time series data from short-lived, service-level batch jobs to intermediary jobs that can be scraped. To make this instrumentation easy, you can combine this with Prometheus’s simple text-based exposition format.
Pushgateway is ideal for capturing the outcome of service-level batch jobs. It is not designed for other use cases. If you attempt to use a single Pushgateway to monitor multiple instances, for example, the Pushgateway will not only become a single point of failure but also a potential bottleneck.
When you include metrics in code that is called more than 100,000 times a second, or is performance critical, you should limit the operations you perform in the inner loop. Here are several techniques that can help you protect inner loops:
Calico Cloud and Calico Enterprise help rapidly pinpoint and resolve performance, connectivity, and security policy issues between microservices running on Kubernetes clusters across the entire stack. They offer the following key features for Kubernetes monitoring and observability, which are not available with Prometheus:
Learn more about Calico for Kubernetes monitoring and observability