Integrating Prometheus AlertManager with PagerDuty in Calico

In the fast-paced world of Kubernetes, guaranteeing optimal performance and reliability of underlying infrastructure is crucial, such as container and Kubernetes networking. One key aspect of achieving this is by effectively managing alerts and notifications. This blog post emphasizes the significance of configuring alerts in a Kubernetes environment, particularly for Calico Enterprise and Cloud, which provides Kubernetes workload networking, security, and observability. Prometheus is embedded in Calico and by integrating the Prometheus AlertManager with PagerDuty, organizations can enhance their observability and incident response time, ensuring that critical alerts are promptly communicated to the appropriate individuals or teams. This integration streamlines alert handling processes and automated service workflows that enable quicker responses and issue resolutions.

Discover the benefits of this integration and learn how to set it up seamlessly. Understand the importance of getting the alerts configured and sent from Calico Enterprise and Cloud.

Importance of getting alerts in a Kubernetes environment

In the dynamic and distributed world of Kubernetes, where applications are intricately orchestrated, one crucial element in achieving optimal performance and reliability is the effective management of alerts and notifications for Kubernetes platform and associated ecosystem components such as Calico. These alerts serve as the early warning system, empowering organizations to promptly detect and respond to potential issues, preventing minor glitches from escalating into major outages.

Early detection and response are not just reactive measures; they are proactive investments in minimizing disruptions and maximizing uptime. By catching issues early on, organizations can identify and solve the root cause in the early stages, reducing the Mean Time To Recover (MTTR) and ensuring the high availability of applications. This translates to seamless user experiences and uninterrupted business operations, fostering customer satisfaction and driving business success.

Moreover, a robust alerting system acts as a guardian of reliability in Kubernetes environments. It continuously watches key metrics and configurations, acting as a vigilant sentinel against anomalies and potential vulnerabilities. By identifying these threats before they can cause significant damage, organizations can take preemptive actions, minimizing the risk of downtime and data loss. This proactive approach not only safeguards the integrity and security of critical systems but also contributes to the overall stability and resilience of the IT infrastructure.

Benefits of integrating AlertManager with PagerDuty

Integrating AlertManager with PagerDuty brings forth an abundance of advantages, revolutionizing the efficiency and effectiveness of incident response within Calico environments.

Furthermore, the integration empowers organizations to combat alert fatigue by implementing robust filtering and grouping mechanisms before alerts reach PagerDuty. This intelligent filtering minimizes unnecessary alerts, allowing teams to focus their attention on the most pressing issues, resulting in optimized response times and enhanced productivity.

Beyond alert management, PagerDuty offers a comprehensive suite of incident response capabilities that perfectly complement AlertManager’s alerting functionality. These capabilities encompass customizable notification channels, flexible escalation policies, and seamless integration with various IT Service Management (ITSM) and collaboration tools. By leveraging PagerDuty’s incident management features, organizations can streamline their workflows, automate response actions, and establish a holistic and efficient approach to incident resolution.

Integrating AlertManager with PagerDuty unlocks a world of benefits that elevate the efficiency and effectiveness of incident response processes in Calico environments by providing the organizations with a prompt and effective resolution of critical issues, ultimately ensuring optimal performance and reinforcing the reliability of their Kubernetes clusters.

PagerDuty’s role

PagerDuty is an incident management platform that helps organizations of all sizes respond to and resolve incidents quickly and efficiently. With PagerDuty, you can create alerts for your Calico cluster and have them sent to the right people on your team, no matter where they are. PagerDuty also provides a central place to manage all of your alerts, so you can easily see what’s happening and take action when necessary.

One of the key benefits of PagerDuty is its ability to filter and group alerts. This can help to reduce the number of unnecessary alerts that are sent to your team, so that they can focus on the most important issues. PagerDuty also provides a variety of notification channels, so that you can be sure that your team is alerted in the way that works best for them.

PagerDuty is a powerful tool that can help you improve the reliability and availability of your Kubernetes cluster. By integrating AlertManager with PagerDuty, you can ensure that critical alerts are sent to the right people in a timely manner so that you can take action quickly and resolve incidents before they cause significant damage. This eliminates the need for constant monitoring of multiple systems and significantly streamlines the alert management process.

Installation and configuration

In order to integrate Prometheus AlertManager with PagerDuty in Calico, you will need to follow the steps below:

  1. Create an account in PagerDuty through the link.
  2. After login on PagerDuty portal, let’s create a service to receive the alerts from AlertManager:
    1. Go to Services and click on “New Service”:
    2. Insert the name of the service and if you wish a description
    3. Leave the default setting and click “Next”
    4. You can customize the notifications. In this case, it will be the default:
    5. In “Integrations”, select the “Events API V2” and “Create Service”:
    6. After creating the service, go to the “Integrations” tab and copy/save the “Integration Key” that will be needed later:
  3. Create a network policy to deny ICMP packets from the label run=multitool:
    1. kubectl apply -f - <<EOF
      apiVersion: projectcalico.org/v3
      kind: NetworkPolicy
      metadata:
       labels:
         projectcalico.org/tier: default
       name: default.multitool
       namespace: default
      spec:
       egress:
       - action: Deny
         destination: {}
         protocol: ICMP
         source: {}
       - action: Allow
         destination:
           namespaceSelector: all()
           ports:
           - 53
           selector: k8s-app == "kube-dns"
         protocol: UDP
         source: {}
       order: -100
       selector: run == "multitool"
       tier: default
       types:
       - Egress
      EOF
      
  4. Create the multitool pod with the label run=multitool and it will ping the www.google.com and face multiple deny packets as per the network policy applied previously:
    1. kubectl apply -f - <<EOF
      apiVersion: v1
      kind: Pod
      metadata:
        creationTimestamp: null
        labels:
          run: multitool
        name: multitool
      spec:
        containers:
        - image: wbitt/network-multitool
          name: multitool
          resources: {}
          command: ["/bin/sh"]
          args: ["-c", "ping -i 0.1 www.google.com"]
        dnsPolicy: ClusterFirst
        restartPolicy: Always
      EOF
  5. Apply the custom Prometheus Rule that will trigger an alert if 1 packet is denied within 10 seconds:
    1. kubectl apply -f - <<EOF
      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        labels:
          prometheus: calico-node-prometheus
          role: tigera-prometheus-rules
        name: tigera-prometheus-dp-rate-custom
        namespace: tigera-prometheus
      spec:
        groups:
        - name: calico.rules
          rules:
          - alert: DeniedPacketsRate
            annotations:
              description: '{{$labels.instance}} with calico-node pod {{$labels.pod}} has
                been denying packets at a fast rate {{$labels.sourceIp}} by policy {$labels.policy}}.'
              summary: Instance {{$labels.instance}} - Packets denied
            expr: rate(calico_denied_packets[10s]) > 1
            for: 1m
            labels:
              severity: critical
      EOF
  6. Create the Alert Manager config file but replace the “routing_key” value with the “Integration Key” saved in step 2.f.
    1. cat <<EOF >> alertmanager-config.yaml 
      global:
        resolve_timeout: 5m
        http_config:
          tls_config:
            insecure_skip_verify: true
      route:
        group_by: ['job']
        group_wait: 30s
        group_interval: 1m
        repeat_interval: 2m
        receiver: 'pagerduty'
      receivers:
      - name: 'pagerduty'
        pagerduty_configs:
        - routing_key: '<INTEGRATION_KEY>'
      EOF
  7. Export the alertmanager-calico-node-alertmanager secrets from tigera-operator namespace:
    1. kubectl -n tigera-operator get secrets alertmanager-calico-node-alertmanager -o yaml > alertmanager-secret.yaml
  8. Export the alertmanager-config.yaml base64 coded to a variable ALERT_CONFIG:
    1. ALERT_CONFIG=$(cat alertmanager-config.yaml | base64 -w 0)
  9. Replace the data of alertmanager.yaml value with the variable value ALERT_CONFIG:
    1. sed -i "s/alertmanager.yaml:.*/alertmanager.yaml: ${ALERT_CONFIG}/g" alertmanager-secret.yaml
  10. Apply the alertmanager-secret.yaml file:
    1. kubectl -n tigera-operator apply -f  alertmanager-secret.yaml
  11. In the PagerDuty portal, you should see the alert triggered by AlertManager and it will be sent to all members of tigera-monitoring-ep in this example. We can also build workflows and configure PagerDuty to send triggered alerts through SMS, emails…

Exposing the AlertManager portal through calico-node-alertmanager service port HTTP 9093 is highly recommended as the AlertManager configuration and alerts can be seen. However, as it will depend on the Kubernetes environment, it is out of the scope of this blog.

Conclusion

In conclusion, integrating Prometheus AlertManager with PagerDuty in Calico provides organizations with a comprehensive and efficient solution for managing alerts in Kubernetes environments. By leveraging Calico metrics scraped by Prometheus and PagerDuty’s robust notification and incident management capabilities, organizations can ensure that critical alerts are promptly communicated to the right individuals or teams, enabling quicker response and resolution of issues. The integration enhances the observability and reliability of Calico clusters by providing direct delivery of alerts, robust filtering and grouping mechanisms, and seamless integration with other systems involved in the incident response workflow. By leveraging the combined strengths of Calico and PagerDuty, organizations can streamline their alert handling processes, improve operational efficiency, and effectively manage incidents in their Kubernetes environments.

Ready to try Calico node-specific policies? Sign up for a free trial of Calico Cloud

Join our mailing list

Get updates on blog posts, workshops, certification programs, new releases, and more!

X