Kubernetes is a popular container orchestrator, providing the abstraction needed to efficiently manage large-scale containerized applications. Kubernetes lets you use declarative configurations and provides advanced deployment mechanisms. It also offers self-healing and scaling capabilities.
Kubernetes can help you manage the lifecycle of a large number of containers. Typically, deploying Kubernetes in production involves the use of multiple clusters with nodes hosting hundreds or thousands of containers. These containers are constantly being destroyed and spun up according to the needs of Kubernetes workloads.
When managing containerized applications at large scale, it is important to proactively use Kubernetes monitoring to debug errors in a timely manner. You can find these errors at various levels of the application, including containers, nodes, and clusters. Kubernetes logging techniques and tools help provide visibility into these elements. These logs can help you track errors and fine tune performance of your applications.
You should maintain logs for applications and other workloads running on Kubernetes. These logs are generated by applications themselves during runtime. This data is usually written to the stdout of the container where the application is running.
The following Kubernetes components generate their own logs: etcd, kube-apiserver, kube-scheduler, kube-proxy, and kubelet.
These logs are usually stored in the /var/log directory of the machine running the service (a master node for control plane components, or a worker node for the kubelet).
It is especially important to collect, aggregate, and monitor logs for the control plane, because performance or security issues affecting the control plane can put the entire cluster at risk.
Kubernetes ingress is now the standard for exposing services in a Kubernetes cluster to external HTTP/S traffic. Therefore, logging ingress traffic is very important for tracking services, issues, errors, and cluster security.
Kubernetes events include information about errors and changes to resource state. For example, events may include scheduler decisions and reasons for pod deletion.
Events are API objects stored on the API server. By default, Kubernetes drops event data 60 minutes after events are fired, so you need to have a mechanism for storing event data in a persistent location.
The Kubernetes audit log details all calls to the Kubernetes API. It provides a sequence of activities, in time series format, leading to a system state at a specific point in time. This is important for forensic investigation of security incidents and for compliance reporting.
In Kubernetes, there are two main levels of logging:
When a Kubernetes pod is removed from a node, the kubelet deletes all the logs. When the pod is restarted, the kubelet retains the current log and the latest version of the log from before the restart, but deletes older logs. This means you cannot rely on the kubelet to keep logs for pods running for long periods of time.
Log rotation is a mechanism that stores each version of a log before it is deleted and replaced by a new version. You will need to use one of several open-source tools to handle scheduled log rotation—logrotate is a common choice. It can save logs based on time and/or file size. It is recommended to use the file criterion to plan for disk capacity, because log files can quickly grow large and exhaust disk space on the node.
There are two types of system components in Kubernetes—the first runs directly on the operating system, and uses the standard operating system logging framework. These logs can be accessed via the Linux journalctl command, or in the /var/logs/ directory.
The second type of Kubernetes component, like API Server and cloud controller manager, runs in its own container. The logs generated by these components use the same mechanism as other containers in the cluster—stdout and stderr.
You can implement cluster-level logging by incorporating a node-level log agent on every node. There are two important things to consider:
It is preferable to run this agent using a DaemonSet, because it is required on every node. This lets you deploy the agent without any changes to running applications.
You can run a utility container, known as a sidecar, instead of running the agent as a DaemonSet. You can achieve this in two ways:
The following best practices can help you perform Kubernetes logging more effectively.
In Kubernetes, authentication typically relies on role based access control (RBAC) to validate access and permissions. During operation of the RBAC mechanism, the system can generate audit logs that are annotated according to the privileges of the user (authorization.k8s.io/decision) and the reason (authorization.k8s.io/reason) system grants access to the user. Activate audit logs to track authentication issues by setting them up in kubectl.
Multiple Kubernetes components generate logs, and these logs are typically aggregated and processed by several tools. To make aggregation easier, logs should be generated in a consistent format. Keep this in mind when you configure stdout and stderr, and when you assign metadata and labels with Fluentd. Additionally, structured logs reduce latency if you use Elasticsearch for large-scale log analysis.
Related content: Read our guide to Kubernetes monitoring tools.
Kubernetes logs can become difficult to manage at the cluster level, because of the large volume of logs. In Kubernetes, DaemonSets allow you to run containers in the background and ensure similar containers are deployed together with any pods that meet certain criteria.
You can use Filebeat and Fluentd to collect logs in Kubernetes. You can run these together with your workloads using DaemonSets. To ensure that log files collection is optimized according to available system resources, configure a resource limit per daemon.
Kubernetes lets you generate audit logs on API invocations. For security purposes, enable the monitor and audit logger so that you can analyze these operations for anomalous behavior.
You can also leverage operational Kubernetes monitoring logs to analyze anomalous behavior and monitor changes in applications. This can help you learn of security vulnerabilities or cyberattacks. Export logs to monitoring solutions, like SIEM tools, to automatically raise alerts and set up customized dashboards and visualizations for Kubernetes security data.
Calico offers the following log types for use in troubleshooting, compliance, and resource management. These logs are all stored in Elasticsearch and can be accessed via the standard Elasticsearch API.
Standard Kubernetes RBAC configuration is used to provide granular access to the different sets of data archived in Elasticsearch.
Data retention limits may be configured for each of these. Additional destinations may be configured for long-term storage (e.g. Amazon S3, Syslog, and Splunk).
Calico also provides a variety of Prometheus metrics for monitoring: