Monitoring involves instrumenting an application to collect and analyze metrics and logs that provide insight into its performance. This allows you to see whether the application behaves correctly. Metrics are used to measure specific aspects of a system’s health over time, while logs record specific events.
Cloud-native monitoring is unlike traditional application monitoring, because monitoring systems must deal with ephemeral objects that can be frequently created and destroyed, and distributed applications made up of multiple independent components.
Like traditional monitoring, cloud-native monitoring should cover a range of parameters including disk space, memory consumption, and CPU usage, as well as whether tasks are performed correctly and are protected from unauthorized access. This is important for tracking performance, cost-effectiveness, and security, which enables operators to quickly respond to any issues detected. Monitoring is also a crucial part of cloud-native security.
In this article:
Cloud-native architectures present challenges for application and infrastructure security. Here are several core challenges:
Infrastructure and DevOps teams use microservices to execute cloud-native applications. Previously, several software functionalities or processes would execute on a single virtual machine. Today, developers package each capability or process as a serverless function or separate container, making each entity more vulnerable. An organization needs to protect these entities from compromise throughout the development lifecycle.
Related content: Read our guide to cloud native application protection platforms (CNAPP)
Cloud-native systems can include a range of private and public clouds, application architectures, and cloud services. Each architectural pattern could display different weaknesses and security demands. Security teams need to visualize the attack surface and find solutions for securing each type of architecture.
Private and public cloud environments are continuously evolving. Rapid release cycles might mean that security teams need to update every component of a microservices application daily. Furthermore, organizations are adopting practices such as infrastructure as code (IaC) and immutability, and so applications are continuously being destroyed and re-created. Security teams find it challenging to secure such deployments without slowing down the release cycle.
A related issue is over-privileged, non-administrator user accounts. Do not allow users to run programs as root, as this creates a variety of security issues.
A known example of this is Docker-based containers running as root. Just because it is possible to run containers as root does not mean it is a recommended practice. Always try to employ the principle of least privilege. Developers working on a cloud-native application, and entities within the application architecture, should only gain access to the resources they actually need.
Read our guide to zero trust security
Another typical security gap is misconfigurations. It is very common for databases, cloud storage buckets, and other cloud resources to be left accessible over the Internet without authentication. Take all measures to prevent such misconfigurations. The solution is not to fix them individually, but to use automation to gain visibility over these issues and remediate them centrally.
There are four pillars for capturing data and ensuring observability, and which provide crucial insights into the health and behavior of cloud-native applications. The data collected from each of these pillars can be used to evaluate systems and applications as they are developed and become more complex.
Logs are records of events—every service or application in your system should log events when they occur. You can use a log aggregation tool to centralize logs and make it easier to search and view them. For example, if an error occurs, the application will notice and log it, so developers can identify where there is an issue.
Metrics combine data from a series of related, measurable events. They tend to be time-based and are measured at regular intervals. This helps provide insights into the type of issue—for example, the number and rate of errors may be consistent or represent a spike in errors.
A cloud-native monitoring tool will typically provide the following metrics to enable different measurement types:
Tracing involves recording related events and presenting them in a meaningful order. All events in the string being traced are linked via a unique ID that passes from the initial request to later events. In distributed systems, a single request can reach multiple services, so tracing helps provide a full, application-level view.
For example, in the case of an error, tracing reveals the overall flow from the initial request to the resulting error. If you can observe the trajectory of the request, you can identify which services it passed through and what may be the root cause.
Alerts draw the attention of developers to a potential issue so they can address it. Alerting tools detect patterns in the data provided by logs, metrics, and tracing to identify anomalies. Activity that departs from the system’s normal state will trigger and alert.
When engineers identify an event (or set of events), they can generate alerts and modify them based on their level of priority. For example, they can set alerts to trigger based on specific thresholds, such as the number or rate of errors. The relevant team receives the alerts and can begin remediating the issue.
Here are a few important capabilities you should look for in a cloud-native monitoring solution.
In the cloud-native datacenter, you must have visibility across VMs, the host, applications, API services, and containers. A monitoring solution must provide visibility even if services are dynamic, containers short-lived, and applications distributed. The monitoring solution must have an engine that can intelligently gather information from these different layers to enable real-time decision making.
Security investigations involving cloud-native workloads can be complex because there are multiple, distributed components communicating via API. Effective security investigations require a monitoring architecture that is distributed, can scale according to workloads, and provides sufficient data retention.
In a cloud-native environment, you could have OpenShift, Kubernetes, Google GKE, Amazon EKS, ECS, or similar services orchestrating your container workloads. You may also use Puppet, Ansible, or Chef to automate deployments.
A monitoring solution should smoothly integrate with these components. In many cases, cloud-native monitoring solutions will provide the option of deploying an agent within a container cluster or alongside serverless functions—a necessity for the cloud-native environment.
Calico offers powerful features for cloud-native monitoring and full-stack observability. These include:
Next Steps