Troubleshooting DNS issues in Kubernetes: Investigate and reduce NXDOMAIN (domain does not exist) responses

NXDOMAIN, indicating the non-existence of a queried domain, poses significant challenges within Kubernetes, impacting application functionality, service communication, and overall cluster stability. Investigating NXDOMAIN responses in Kubernetes is vital for sustaining the reliability, performance, and security in a containerized environment.

The role of DNS observability in Kubernetes

Service Discovery and Communication:

Kubernetes heavily relies on DNS for service discovery, where each service has a DNS name enabling communication. Frequent NXDOMAIN responses disrupt communication, affecting the functionality of applications and the interplay between microservices.

Pod-to-Pod Communication:

Pods in a Kubernetes cluster communicate via DNS. Persistent NXDOMAIN responses indicate a breakdown in pod-to-pod communication, necessitating investigation for seamless interaction and functionality.

Application Availability:

Applications often depend on external services, APIs, or databases. NXDOMAIN responses during external resource access lead to performance degradation or application failures, underscoring the need to investigate and ensure overall availability.

Security Considerations:

DNS plays a critical role in securing communication within Kubernetes. Unresolved NXDOMAIN responses may expose security vulnerabilities, allowing unauthorized entities to masquerade or gain unauthorized access. Investigation is vital for maintaining cluster security.

Troubleshooting and Diagnostics:

NXDOMAIN responses serve as early indicators of broader issues. Investigating these responses aids troubleshooting and diagnostics, preventing cascading failures and preserving the cluster’s health.

Monitoring and Observability:

Understanding NXDOMAIN response frequency and patterns is essential for effective monitoring. Investigation ensures proactive resolution, safeguarding the performance and health of the DNS infrastructure in the Kubernetes cluster.

Investigating NXDOMAIN responses

In Kubernetes it can be tricky to investigate on NXDOMAIN responses because there are several subdomains used for services and other resources inside the cluster. There could be other reasons such as:

  1. DNS resolvers do not attempt to run internal lookups only for Fully Qualified Domain Names (FQDNs)
  2. To be considered FQDN, a domain must have a dot at the end (for example, tigera.io.).

For these reasons, it’s common that a workload tries to connect to external domains, which do not include the dot at the end of it, forcing the container (and the host) to generate internal lookups, which are going to fail. For example, in this case “curl http://kubernetes.io” will generate these lookups:

A kubernetes.io.my-app.svc.cluster.local.
AAAA kubernetes.io.my-app.svc.cluster.local.
A kubernetes.io.svc.cluster.local.
AAAA kubernetes.io.svc.cluster.local.
A kubernetes.io.cluster.local.
AAAA kubernetes.io.cluster.local.
A kubernetes.io.ca-central-1.compute.internal.
AAAA kubernetes.io.ca-central-1.compute.internal.
A kubernetes.io.
AAAA kubernetes.io.

All of them, except for the last 2, will generate an NXDOMAIN response, flooding the cluster with useless traffic and events.

Using Calico Enterprise or Calico Cloud, you can access the DNS dashboard in Kibana and spot workloads which generate the most NXDOMAIN responses.

1. Access to the DNS Dashboard as shown here:

2. Filter for NXDOMAIN responses and exclude noise coming from core components, using this filter:

NOT client_name_aggr: *tigera-* and NOT client_name_aggr: *calico-* and NOT client_namespace: tigera-* and NOT qname: tigera-secure-es-* and NOT client_name_aggr: ebs-csi-controller-*

3. Spot the workload that is generating several events:

4. Inspect the traffic using the packet capture feature from Service Graph:

5. Use Wireshark to open the “.pcap” file:

This specific workload continuously runs “curl” against the following domain names:

google.com
kubernetes.io
tigera.io.

From the screenshot above, you can see that, while the first 2 domains generate internal lookups, the third one (tigera.io.) does not, this because the container knows it is an FQDN and it will query the external DNS resolver, rather than search for internal subdomains.

6. Add the “ndot” option in the deployment, to “force” the container to skip internal lookups for all domains with at least the number of dots specified:

apiVersion: apps/v1
kind: Deployment
…
spec:
replicas: 6
selector:
matchLabels:
run: my-app
template:
metadata:
labels:
run: my-app
spec:
dnsConfig:
options:
- name: ndots
value: "1"
…

7. Check packet capture again to make sure that the deployment is no longer running internal lookups:

As you can see, internal lookups for “kubernetes.io” and “google.com” are no longer generated, drastically reducing the number of NXDOMAIN responses.

Summary

Investigating NXDOMAIN responses is pivotal for Kubernetes’ reliability, security, and overall health. These events provide early warnings, prompting proactive measures to sustain application operations, service communication, and the integrity of the Kubernetes environment. A swift and thorough approach to NXDOMAIN response investigation is essential for successful containerized application deployment and management.

Calico proves instrumental in troubleshooting DNS observability issues by offering a powerful suite of features. With its intuitive interface and advanced analytics, Calico provides comprehensive visibility and facilitates efficient debugging, ensuring the smooth functioning of DNS services. As organizations navigate the complexities of their network infrastructure, Calico enables proactive identification and resolution of issues for a resilient, reliable and secure Kubernetes environment.

Check out these additional resources on DNS observability:

Ready to try Calico node-specific policies? Sign up for a free trial of Calico Cloud

Join our mailing list

Get updates on blog posts, workshops, certification programs, new releases, and more!

X