Kubernetes Logging: Approaches and Best Practices

What is Kubernetes Logging?

Kubernetes is a popular container orchestrator, providing the abstraction needed to efficiently manage large-scale containerized applications. Kubernetes lets you use declarative configurations and provides advanced deployment mechanisms. It also offers self-healing and scaling capabilities.

Kubernetes can help you manage the lifecycle of a large number of containers. Typically, deploying Kubernetes in production involves the use of multiple clusters with nodes hosting hundreds or thousands of containers. These containers are constantly being destroyed and spun up according to the needs of Kubernetes workloads.

When managing containerized applications at large scale, it is important to proactively use Kubernetes monitoring to debug errors in a timely manner. You can find these errors at various levels of the application, including containers, nodes, and clusters. Kubernetes logging techniques and tools help provide visibility into these elements. These logs can help you track errors and fine tune performance of your applications.

In this article, you will learn:

What Should You Log in Kubernetes?
Kubernetes Logging Architecture and Methods
Kubernetes Logging Best Practices
Kubernetes Logging with Calico

What Should You Log in Kubernetes?

Application Logs

You should maintain logs for applications and other workloads running on Kubernetes. These logs are generated by applications themselves during runtime. This data is usually written to the stdout of the container where the application is running.

Kubernetes Cluster Component Logs

The following Kubernetes components generate their own logs: etcd, kube-apiserver, kube-scheduler, kube-proxy, and kubelet.

These logs are usually stored in the /var/log directory of the machine running the service (a master node for control plane components, or a worker node for the kubelet).

It is especially important to collect, aggregate, and monitor logs for the control plane, because performance or security issues affecting the control plane can put the entire cluster at risk.

Kubernetes Ingress Logging

Kubernetes ingress is now the standard for exposing services in a Kubernetes cluster to external HTTP/S traffic. Therefore, logging ingress traffic is very important for tracking services, issues, errors, and cluster security.

Kubernetes Events

Kubernetes events include information about errors and changes to resource state. For example, events may include scheduler decisions and reasons for pod deletion.

Events are API objects stored on the API server. By default, Kubernetes drops event data 60 minutes after events are fired, so you need to have a mechanism for storing event data in a persistent location.

Kubernetes Audit Logs

The Kubernetes audit log details all calls to the Kubernetes API. It provides a sequence of activities, in time series format, leading to a system state at a specific point in time. This is important for forensic investigation of security incidents and for compliance reporting.

Kubernetes Logging Architecture and Methods

In Kubernetes, there are two main levels of logging:

Container-level logging – Logs are generated by containers using stdout and stderr, and can be accessed using the logs command in kubectl. Kubernetes has log drivers for each container runtime, and can automatically locate and read these log files.
Node-level logging – This includes actual log files saved at the node level. You can remotely view and delete these logs.

Kubernetes Log Rotation

When a Kubernetes pod is removed from a node, the kubelet deletes all the logs. When the pod is restarted, the kubelet retains the current log and the latest version of the log from before the restart, but deletes older logs. This means you cannot rely on the kubelet to keep logs for pods running for long periods of time.

Log rotation is a mechanism that stores each version of a log before it is deleted and replaced by a new version. You will need to use one of several open-source tools to handle scheduled log rotation—logrotate is a common choice. It can save logs based on time and/or file size. It is recommended to use the file criterion to plan for disk capacity, because log files can quickly grow large and exhaust disk space on the node.

Logging for Operating System Components vs. Container-Based Components

There are two types of system components in Kubernetes—the first runs directly on the operating system, and uses the standard operating system logging framework. These logs can be accessed via the Linux journalctl command, or in the /var/logs/ directory.

The second type of Kubernetes component, like API Server and cloud controller manager, runs in its own container. The logs generated by these components use the same mechanism as other containers in the cluster—stdout and stderr.

Using a Node Logging Agent

You can implement cluster-level logging by incorporating a node-level log agent on every node. There are two important things to consider:

Log proxy – A specialized tool for publishing logs or sending them to the backend.
Log agent – A container that can usually access any directory with log files in any application container on a node. Node-level agents gather logs so they can be aggregated.

It is preferable to run this agent using a DaemonSet, because it is required on every node. This lets you deploy the agent without any changes to running applications.

Utility Containers

You can run a utility container, known as a sidecar, instead of running the agent as a DaemonSet. You can achieve this in two ways:

Custom node log agent – You can set up an agent that sends log traffic via a custom stdout file.
Direct all traffic directly to a central repository – You don’t need to make changes to container images, but you do need to adjust the deployment specifications of each deployed application. This is a suitable option for cases where containers cannot be run with elevated privileges, or if you want to make sure that applications can only access their own log stores.

Kubernetes Logging Best Practices

The following best practices can help you perform Kubernetes logging more effectively.

Control Access to Logs with RBAC

In Kubernetes, authentication typically relies on role based access control (RBAC) to validate access and permissions. During operation of the RBAC mechanism, the system can generate audit logs that are annotated according to the privileges of the user (authorization.k8s.io/decision) and the reason (authorization.k8s.io/reason) system grants access to the user. Activate audit logs to track authentication issues by setting them up in kubectl.

Keep Log Formats Consistent

Multiple Kubernetes components generate logs, and these logs are typically aggregated and processed by several tools. To make aggregation easier, logs should be generated in a consistent format. Keep this in mind when you configure stdout and stderr, and when you assign metadata and labels with Fluentd. Additionally, structured logs reduce latency if you use Elasticsearch for large-scale log analysis.

Related content: Read our guide to Kubernetes monitoring tools.

Set Resource Limits on Log Collection Daemons

Kubernetes logs can become difficult to manage at the cluster level, because of the large volume of logs. In Kubernetes, DaemonSets allow you to run containers in the background and ensure similar containers are deployed together with any pods that meet certain criteria.

You can use Filebeat and Fluentd to collect logs in Kubernetes. You can run these together with your workloads using DaemonSets. To ensure that log files collection is optimized according to available system resources, configure a resource limit per daemon.

Export Kubernetes Logs to SIEM

Kubernetes lets you generate audit logs on API invocations. For security purposes, enable the monitor and audit logger so that you can analyze these operations for anomalous behavior.

You can also leverage operational Kubernetes monitoring logs to analyze anomalous behavior and monitor changes in applications. This can help you learn of security vulnerabilities or cyberattacks. Export logs to monitoring solutions, like SIEM tools, to automatically raise alerts and set up customized dashboards and visualizations for Kubernetes security data.

Kubernetes Logging with Calico

Calico offers the following log types for use in troubleshooting, compliance, and resource management. These logs are all stored in Elasticsearch and can be accessed via the standard Elasticsearch API.

Audit logs – With Calico, audit events logs for Calico resources are pushed to Elasticsearch. By default, logs are recorded for create, patch, update, and delete events at the RequestResponse level for the majority of the Calico-specific resources. The exact set of verbs, stages, and resource types is configurable using standard Kubernetes audit policy.
BGP logs – Calico pushes BGP activity logs to Elasticsearch. These are raw messages that may be queried by time, node, and IP version.
DNS logs – Calico pushes DNS activity logs to Elasticsearch for DNS information that is obtained from trusted DNS servers, where you can perform queries once a set of DNS logs has accumulated. There are 18 key/value pairs in the JSON blob. DNS logs of low significance may be suppressed using filters.
Flow logs – Calico pushes flow logs to Elasticsearch. This includes L3 connectivity information, Kubernetes metadata such as pod namespace and labels, service ports, connection and traffic statistics, NetworkPolicy trace, TCP statistics, and process information.
L7 logs – Calico implements an Envoy log collector so L7 metrics can be collected with or without a service mesh. Calico has 22 key/value pairs in the JSON blob, among them bytes_in, bytes_out, duration_mean, duration_max, src_name_aggr, src_namespace, and src_type.

Standard Kubernetes RBAC configuration is used to provide granular access to the different sets of data archived in Elasticsearch.

Data retention limits may be configured for each of these. Additional destinations may be configured for long-term storage (e.g. Amazon S3, Syslog, and Splunk).

Calico also provides a variety of Prometheus metrics for monitoring:

BGP metrics – Calico provides time series statistics for BGP peers, imported routes, and route updates.
License metrics – Calico provides time series statistics for license information. This allows monitoring of days to expiration, number of nodes available, and number of nodes used (from a licensing perspective).
Felix metrics – Calico can be configured to provide time series statistics covering a variety of low-level information, as well as CPU and memory usage reported by Felix—the main Calico networking and network policy agent.
Kube-controllers metrics – Calico can be configured to provide time series statistics of IP, CPU, and memory usage reported by the Calico kube-controllers component.
NetworkPolicy metrics – Calico enables you to monitor the effects of policies configured in your cluster. By defining a set of simple rules and thresholds, you can monitor traffic metrics and receive alerts when configured thresholds are exceeded.
Elasticsearch and Fluentd metrics – Calico exports usage and operational metrics for Elasticsearch and Fluentd to monitor the health of the logging pipeline.

Learn more about Calico for Kubernetes monitoring and observability

Rate this article

ratings

0 / 5 Average