Guides

Prometheus Monitoring

Prometheus Monitoring: Use Cases, Metrics, and Best Practices

What Is Prometheus?

Prometheus is an open-source technology designed to provide monitoring and alerting functionality for cloud-native environments, including Kubernetes. It can collect and store metrics as time-series data, recording information with a timestamp. It can also collect and record labels, which are optional key-value pairs.

Key features of Prometheus include:

  • Multidimensional data model – Using time-series data, which is identified by metric name and key-value pairs.
  • PromQL – A flexible querying language that can leverage the multi-dimensional data model.
  • No reliance on distributed storage – All single server nodes remain autonomous.
  • Pull model – Prometheus can collect time-series data by actively “pulling” data over HTTP.
  • Pushing time-series data – Available through the use of an intermediary gateway.
  • Monitoring target discovery – Available through static configuration or service discovery.
  • Visualization – Prometheus offers multiple types of graphs and dashboards.

Prometheus was initially created by SoundCloud back in 2012. Since its inception, Prometheus has become a popular monitoring tool supported by an independent community of contributors. In 2016, Prometheus joined the Cloud Native Computing Foundation (CNCF), and is now a graduated CNCF project.

This is part of an extensive series of guides about performance testing.

In this article:

How Does Prometheus Monitoring Work?

To get metrics, Prometheus requires an exposed HTTP endpoint. Once an endpoint is available, Prometheus can start scraping numerical data, capture it as a time series, and store it in a local database suited to time-series data. Prometheus can also be integrated with remote storage repositories.

Users can leverage queries to create temporary times series from the source. These series are defined by metric names and labels. Queries are written in PromQL, a unique language that allows users to choose and aggregate time-series data in real time. PromQL can also help you establish alert conditions, resulting in notifications to external systems like email, PagerDuty, or Slack.

Prometheus can display collected data in tabular or graph form, shown in its web-based user interface. You can also use APIs to integrate with third-party visualization solutions like Grafana.

What Can You Monitor with Prometheus?

Prometheus is a versatile monitoring tool, which you can use to monitor a variety of infrastructure and application metrics. Here are a few common use cases.

Service Metrics

Prometheus is typically used to collect numeric metrics from services that run 24/7 and allow metric data to be accessed via HTTP endpoints. This can be done manually or with various client libraries. Prometheus exposes data using a simple format, with a new line for each metric, separated with line feed characters. The file is published on an HTTP server that Prometheus can query and scrape metrics from based on the specified path, port, and hostname.

Prometheus can also be used for distributed services, which are run on multiple hosts. Each instance publishes its own metrics and has a name that Prometheus can distinguish.

Host Metrics

You can monitor the operating system to identify when a server’s hard disk is full or if a server operates constantly at 100% CPU. You can install a special exporter on the host to collect the operating system information and publish it to an HTTP-reachable location.

Website Uptime/Up Status

Prometheus doesn’t usually monitor website status, but you can use a blackbox exporter to enable this. You specify the target URL to query an endpoint, and perform an uptime check to receive information such as the website’s response time. You define the hosts to be queried in the prometheus.yml configuration file, using relabel_configs to ensure Prometheus uses the blackbox exporter.

Cronjobs

To check if a cronjob is running at the specified intervals, you can use the Push Gateway to display metrics to Prometheus through an HTTP endpoint. You can push the timestamp of the last successful job (i.e. a backup job) to the Gateway, and compare it with the current time in Prometheus. If the time exceeds the specified threshold, the monitor times out and triggers an alert.

Why Use Prometheus for Kubernetes Monitoring?

Prometheus is a common choice for Kubernetes monitoring, because it was built for a cloud-native environment. Here are several key benefits of using Prometheus to monitor Kubernetes workloads:

  • Multidimensional data model – The use of key-value pairs creates a similarity to how Kubernetes uses labels to organize infrastructure metadata. This similarity ensures time-series data can be collected and analyzed accurately by Prometheus.
  • Accessible format and protocols – Prometheus enables easy and simple exposure of metrics. It ensures metrics are human-readable and can be published via standard HTTP transport.
  • Service discovery – Prometheus server periodically scrapes targets. Services and applications do not have to constantly emit data—metrics are pulled, instead of pushed. Prometheus servers can employ several techniques to auto-discover scrape targets. You can, for example, configure the servers to filter and match container metadata.
  • Modular and highly available components – Composable services are responsible for performing metric collection, graphical visualization, alerting, and more. Each of these services support sharding and redundancy.

Learn more in our detailed guide to Prometheus for Kubernetes

Prometheus Metric Types

The client libraries of Prometheus offer four core types of metrics. However, the Prometheus server does not currently save these metrics as different data types. Instead, it flattens all information into an untyped time series.

Counter

This is a cumulative metric. It represents a single monotonically-increasing counter, and its value can either increase or be reset to zero on restart.

There are several use cases that suit counter metrics. You can, for example, use it to represent the number of served requests, errors, or completed tasks. You should never use counters to expose values that can decrease, like the number of running processes.

Gauge

This metric represents one numerical value, which can arbitrarily go down and up. A gauge is often used to measure values like current memory usage or temperatures.

Histogram

A histogram samples observations, such as request durations or response sizes. It then counts the observations in a configurable bucket. A histogram can also provide a total sum of all the observed values.

Summary

A summary can sample observations, such as request durations and response sizes. Additionally, it can provide a total count of the observations as well as a total sum of all observed values. It can calculate configurable quantiles over a sliding time window.

Learn more in our detailed guide to Prometheus metrics

Best Practices for Prometheus Monitoring

Here are several key best practices for implementing Prometheus monitoring.

Choose the Best Exporter

Prometheus uses exporters to retrieve metrics from systems that cannot easily be scraped, such as HAProxy or Linux operating systems. Exporters are client libraries deployed on the target system, which export metrics and send them to Prometheus.

While all Prometheus exporters provide similar functionality, you should choose the most relevant exporter for your purposes. This can critically affect the success of your Kubernetes monitoring strategy. You can research the available exporters and evaluate how each handles the metrics relevant to your workloads. You should also assess the quality of the exporter, according to parameters like user reviews, recent updates, and security advisories.

Label Carefully

Consult the documentation of your chosen exporter and learn how to label your metrics in a way that provides context. Learn how to establish consistent labeling across different monitoring targets. While you can customize and define your own data, remember that each label you create uses resources. On a larger scale, too many labels can increase your overall resource costs. This is why you should strive to use up to 10 labels.

Set Actionable Alerts

A well-defined alerting strategy can help you achieve effective performance monitoring. You should first determine which events or metrics are critical to monitor, and then set a reasonable threshold that can catch issues before they can affect your end-users. Ideally, you should define a threshold that does not cause alert fatigue. You should also ensure the notifications are properly configured to reach the appropriate team in a timely manner.

Container Monitoring and Observability with Calico

Calico Cloud and Calico Enterprise help rapidly pinpoint and resolve performance, connectivity, and security policy issues between microservices running on Kubernetes clusters across the entire stack. They offer the following key features for container and Kubernetes monitoring and observability, which are not available with Prometheus:

  • Dynamic Service Graph – A point-to-point, topographical representation of traffic flow and policy that shows how workloads within the cluster are communicating, and across which namespaces. Also includes advanced capabilities to filter resources, save views, and troubleshoot service issues.
  • DNS dashboard – Helps accelerate DNS-related troubleshooting and problem resolution in Kubernetes environments by providing an interactive UI with exclusive DNS metrics.
  • Dynamic Packet Capture – Captures packets from a specific pod or collection of pods with specified packet sizes and duration, in order to troubleshoot performance hotspots and connectivity issues faster.
  • Application-level observability – Provides a centralized, all-encompassing view of service-to-service traffic in the Kubernetes cluster to detect anomalous behavior like attempts to access applications or restricted URLs, and scans for particular URLs.

Learn more about Calico for container and Kubernetes monitoring and observability

See Additional Guides on Key Performance Testing Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of performance testing.

Application Performance Monitoring

Authored by Granulate

Optimizing Python

Authored by Granulate

Lambda Performance

Authored by Lumigo

Join our mailing list​

Get updates on blog posts, workshops, certification programs, new releases, and more!