Observability and Troubleshooting
Connectivity issues between microservices are difficult to troubleshoot. Troubleshooting often requires collaboration between multiple teams to identify and resolve the problem.
Calico Enterprise and Calico Cloud offer tools to rapidly pinpoint and resolve the source of a connectivity issue between your microservices running on Kubernetes clusters, as well as tools to identify and resolve potential connectivity issues before they happen.
Dynamic Service Graph
Watch Details Video
Dynamic Service Graph
The Dynamic Service Graph, available in Calico Enterprise and Calico Cloud, provides visibility across the stack from network to application layer (L3 – L7). It provides the most accurate and relevant view of how services are operating in your Kubernetes cluster.
The Dynamic Service Graph can generate a detailed visualization of the cluster environment that enables anyone to easily understand how microservices are behaving and interacting with each other at run-time, simplifying the debugging process. It provides DevOps, SREs and service owners with a point-to-point, topographical representation of network traffic within a cluster that shows how workloads within the cluster are communicating, and across which namespaces.
Along with this information, the Dynamic Service Graph provides metadata on ports, protocols, how network policies are being evaluated, and other details that help Kubernetes teams understand how end-to-end communication is occurring. Performance hotspots are automatically identified and highlighted, and alerts are provided in the context of the Service Graph. The Dynamic Service Graph also includes advanced capabilities to filter resources, save views, and troubleshoot DNS issues.
Calico Enterprise and Calico Cloud log all connection attempts between microservices as well as performance metrics for those connections.
What makes this approach unique is that important Kubernetes metadata is included with each log entry, including:
- Source and destination namespaces
- Source and destination pods and labels
- Which policies evaluated the connection, and
- Whether the connection was accepted or denied and why
The flow logs are stored in a central data store that is shared across all of your clusters. Once set up, Calico Enterprise and Calico Cloud can then monitor and report on microservices connections across all your clusters.
Flow log data can be queried by Kibana, often used for compliance reporting. You can also use the Flow Visualizer to visualize the connections and interactively drill in to identify where connections are being dropped.
The Flow Visualizer queries your flow log data and renders an interactive graph that visualizes your microservice connections, the volume of connections, and whether the connection was successful.
Using filters you can drill down into specific namespaces, workloads, and connection status. Highlight any connection to see performance metrics as well as security policies that evaluated the traffic and whether the connection was allowed or denied by that policy.
For example, if microservice A is unable to connect to microservice B, you set a filter in the Flow Visualizer to display only denied connections, and then filter by microservice A. You could see all connection attempts from Microservice A. Clicking on the connection you are troubleshooting will show you traffic statistics as well as which security policy blocked the connection.
Kubernetes Security Policies are the Kubernetes-standard approach to cluster security. The challenge with Security Policies is that they are immediately enforced when applied. A typo or forgotten dependency can result in connectivity issues between your microservices that would need to be debugged.
Policy Preview in Calico Enterprise and Calico Cloud evaluates historic connections between your microservices and reports on which connections will be accepted or denied when the policy is committed. With Policy Preview you can have confidence that you are not breaking things or creating a regression resulting from changes to your security policies.
Calico Enterprise and Calico Cloud can run your security policies in a “Staged” mode. When in staged mode, you can run policies in a permissive manner that allows all connections, but logs any that would have been accepted or denied. This enables you to safely automate security-as-code as part of your continuous deployment process, with a manual or automated gate to evaluate the results of security changes before deciding to commit and enforce the policy.