DNS troubleshooting for Kubernetes applications with Calico DNS dashboards

Within Kubernetes, the Domain Name System (DNS) plays a pivotal role in facilitating service discovery, allowing pods to effectively locate and interact with other services within the cluster. For organizations transitioning their workloads to Kubernetes, establishing connectivity with services external to the cluster is equally important.

In this context, DNS serves the crucial purpose of resolving external service names to their respective IP addresses.The DNS functionality in Kubernetes is typically implemented using a set of core-dns pods that are exposed as a service called kube-dns.

By default, Kubernetes does not provide DNS troubleshooting tools or dashboards, which means that Kubernetes administrators must rely on solutions such as Calico to fulfill this requirement.

In this blog post, we will explore the utilization of Calico dashboards to gain a comprehensive overview of DNS within a Kubernetes cluster.

Both Calico Enterprise and Calico Cloud offer a pre configured DNS Dashboard, as well as the flexibility for users to create their own custom DNS Dashboards.

DNS logs

Before we explore the DNS Dashboard, it is important to understand the origin of the information that is presented within the dashboard. Calico improves visibility and monitoring capabilities in Kubernetes clusters by deploying agents on every node.

These agents actively observe DNS requests and responses, capturing and logging the data in a dedicated location: /var/log/calico/dnslogs on the respective nodes.

Alongside the DNS monitoring agents, Calico also installs a log shipping agent (fluentd) on each node. This fluentd agent enables the secure transfer of DNS logs from the nodes to a centralized Elasticsearch instance, ensuring centralized and efficient log management (DNS Dashboard).

DNS Dashboard

Calico Enterprise and Calico Cloud both offer a comprehensive DNS dashboard used to gain deeper insights into the DNS traffic within your environment.

Fig 1: Default DNS Dashboard

Calico provides a pre-built DNS dashboard (shown above) that presents a comprehensive overview of the DNS statistics for the cluster. This dashboard enables DevOps and application teams to quickly identify DNS health and performance issues. The dashboard offers various analytics, including:

  • Grouping DNS requests by the type of requested resource record.
  • Identifying DNS response codes to distinguish successful and erroneous DNS resolution attempts.
  • Monitoring external domain resolution to track connections to services outside the cluster.
  • Analyzing the rate of DNS queries to identify potential performance bottlenecks.
  • Measuring DNS response latency to pinpoint application performance issues.

With these analytics, the Calico DNS dashboard empowers teams to efficiently monitor and troubleshoot DNS-related aspects of their applications and infrastructure.

Below is an illustration of a DNS dashboard for a Pod named “multi” that established a connection to www.microsoft.com. From within this Pod, we also attempted to connect to www.tigera.io and microsoft.com. However, both of these attempts failed due to a Calico DNS Policy in effect, which permits this pod access only to www.microsoft.com.

You can find more information about Calico DNS Policy here.

Other information we can find in this Dashboard:

  • Total DNS requests
  • DNS Responses
  • DNS Internal/External Queries
  • DNS Latency
  • DNS Internal/External Queries by Service
  • DNS Responses by Service

Fig 2: DNS Dashboard with filter on “multi” pod

For example, if there is a Kubernetes service failure you can determine which service was queried, and can then further explore that service. The categorization of queries and replies by service significantly streamlines the troubleshooting process, expediting the resolution of issues.

Customization options

Calico offers the possibility to optimize your DNS configuration, with these options:

DNS Trusted Servers

By default, Calico trusts the Kubernetes cluster’s DNS service (kube-dns or CoreDNS). To change the default DNS trusted servers you can use Felixconfiguration to specify Trusted DNS servers.

For host endpoints you will need to add the IP addresses that the cluster nodes use for DNS resolution.

DNSExtraTTL

Another way you can enhance your DNS configuration is by utilizing the “DNSExtraTTL”. It’s worth noting that many applications tend to overlook the DNS Time-to-Live (TTL). This is where the usefulness of DNSExtraTTL becomes apparent.

With Trusted DNS servers, the user specifies the IPs or Services of DNS servers he can trust. DNSExtraTTL ensures that client applications prevent traffic from being blocked even when they disregard the DNS TTL and continue communicating with an outdated destination.

Conclusion

Calico offers a wide range of features that enable real-time and detailed visibility into DNS traffic within the cluster and towards external DNS servers. This capability greatly assists in promptly identifying, isolating, and resolving DNS errors.

In Kubernetes, DNS errors can have a detrimental impact on application performance and negatively affect the end-user experience. Without proper observability solutions, DevOps and application teams may spend days trying to identify intermittent DNS issues.

Ready to try Calico’s dashboards? Sign up for a free trial of Calico Cloud

Join our mailing list

Get updates on blog posts, workshops, certification programs, new releases, and more!

X