Achieving high availability (HA) Redis Kubernetes clusters with Calico Clustermesh in Microsoft AKS

According to the recent Datadog report on real world container usage, Redis is among the top 5 technologies used in containerized workloads running on Kubernetes.

Redis database is deployed across multi-region clusters to be Highly Available(HA) to a microservices application. However, while Kubernetes mandates how the networking and security policy is deployed and configured in a single cluster it is challenging to enforce inter-cluster communication at pod-level, enforce security policies and connect to services running in pods across multiple clusters.

Calico Clustermesh provides an elegant solution to highly available multiple Redis clusters without any overheads. By default, deployed Kubernetes pods can only see pods within their cluster.

Using Calico Clustermesh, you can grant access to other clusters and the applications they are running. Calico Clustermesh comes with Federated Endpoint Identity and Federated Services.

Federated endpoint identity

Calico federated endpoint identity and federated services are implemented in Kubernetes at the network layer. To apply fine-grained network policy between multiple clusters, the pod source and destination IPs must be preserved. So the prerequisite for enabling federated endpoints requires clusters to be designed with common networking across clusters (routable pod IPs) with no encapsulation.

Federated services

Federated services works with federated endpoint identity, providing cross-cluster service discovery for a local cluster. Federated services use the Tigera Federated Services Controller to federate all Kubernetes endpoints (workload and host endpoints) across all of the clusters. The Federated Services Controller accesses service and endpoints data in the remote clusters directly through the Kubernetes API.

HA and service federation use case

Overview

Let’s get started. Setup will have client application and a target application.

  1. A client application that needs to connect and transact constantly with another critical application/service without which the client application essentially ceases to fulfill its primary business function.
    For the client application, we will be using the Online Boutique/Hipstershop demo microservices application developed by Google that requires a connection to a Redis database to create necessary tables and store the state of the online store application.
  2. A target application/svc being called by the client application that needs continuous uptime and be highly-available that we will be federating across multiple clusters.
    The application/service we will be federating will be Redis deployed in an Active-Active configuration across multiple clusters.

Client microservices application: Hipstershop

How the Hipstershop ‘cartservice’ utilizes Redis

In the diagram above, the ‘Redis cache’ is the piece we are interested in federating. The ‘cart’ or ‘cartservice’ pod is responsible for consuming Redis as a K8s Service. The ‘cartservice’ pod does this by utilizing a container environment variable called REDIS_ADDR pointing to the DNS name of the Redis K8s Service in its Deployment spec as shown below: 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cartservice
spec:
  selector:
    matchLabels:
      app: cartservice
  template:
    metadata:
      labels:
        app: cartservice
    spec:
      serviceAccountName: default
      terminationGracePeriodSeconds: 5
      containers:
      - name: server
        image: gcr.io/google-samples/microservices-demo/cartservice:v0.5.1
        ports:
        - containerPort: 7070
        env:
        - name: REDIS_ADDR
          value: "redis-cart:6379"
        resources:
          requests:
            cpu: 200m
            memory: 64Mi
          limits:
            cpu: 300m
            memory: 128Mi
        readinessProbe:
          initialDelaySeconds: 15
          exec:
            command: ["/bin/grpc_health_probe", "-addr=:7070", "-rpc-timeout=5s"]
        livenessProbe:
          initialDelaySeconds: 15
          periodSeconds: 10
          exec:
            command: ["/bin/grpc_health_probe", "-addr=:7070", "-rpc-timeout=5s"]

The Redis service being consumed is a pod/svc that is also deployed as part of the default config. We will be replacing this single basic Redis pod deployment with a full Redis Enterprise Cluster deployment backed by a database service installed on a multi-region, multi-cluster setup and federating out the database service.

Lab environment and setup

Reference Github Repository

The environment can be set up in any Azure account and using the Azure Shell.
Getting Started

Azure Components

The first step is to understand the Azure AKS multi-cluster environment we will be setting up and working in.

  • Two regions for the Azure resources/resource groups to be deployed in, westus and canadacentral
  • One vnet and minimum of one subnet per region deployed
  • Vnet peering enabled and setup between the vnets in the two regions
  • Azure CNI will be used to allow for pods to get IPs allocated from the Vnets and to allow for those pod IPs to be routable between Peered Vnets
  • Deploy an AKS cluster in each region with their respective configs

Deploying Redis in active-active HA mode

The Redis on Kubernetes architecture uses an operator-based deployment to deal with the nuances associated with deploying a Redis Enterprise Cluster. The Redis Enterprise API as well as the database services are deployed to a pod image with quorum maintained with minimum 3 pods per deployment in a cluster. Node taints/tolerations are applied such that ideally there is one Redis pod per worker node (requiring 3)

Detailed architecture documentation from Redis can be found here.

The Active-Active Redis Enterprise Cluster uses an Ingress resource to allow the API and databases to sync. As it is difficult to tweak the Redis Operator to decouple from this model currently, we will create the ingress resources but then federate the ClusterIP Redis database service between both the clusters for the Hipstershop cartservice to point to.

Then the Redis Enterprise Cluster is deployed on both clusters with identical namespace name created on both clusters as it is a pre-requisite for service federation due to how kube-dns uses namespace as part of the service FQDN in Kubernetes.

In each cluster, a ClusterIP database service (named testdb in this example) gets set up backing a clean empty Active-Active Redis database now syncing between two clusters. This is the target service for the federation.

apiVersion: v1
kind: Service
metadata:
  annotations:
    redis.io/last-keys: '[]'
  creationTimestamp: "2023-04-05T19:54:11Z"
  labels:
    app: redis-enterprise
    federation: "yes"
    redis.io/bdb: "2"
    redis.io/cluster: demo-clustera
  name: testdb
  namespace: redis
  ownerReferences:
  - apiVersion: app.redislabs.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: RedisEnterpriseCluster
    name: demo-clustera
    uid: 485d9f4d-643b-4c0d-90da-07bd6b052b19
  resourceVersion: "518731"
  uid: 14d08a31-cd9b-4aba-a203-ae35bd34f437
spec:
  clusterIP: 10.0.27.178
  clusterIPs:
  - 10.0.27.178
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: redis
    port: 11069
    protocol: TCP
    targetPort: 11069
  selector:
    redis.io/bdb-2: "2"
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

In the lab setup with the two clusters, we can check the endpoints to see the IPs of the backing pods for the active Redis svc in each cluster.

westus cluster:

# kubectl get endpoints -n redis | grep testdb                                    
testdb               10.0.0.40:11069

canadacentral cluster:

# kubectl get endpoints -n redis | grep testdb                                                                         
testdb               10.1.0.131:11069                                           

Creating federated endpoints

Before a service can be federated, first clusters need to be aware of each other and collect each other’s remote pod endpoints, which we call federated endpoints. Calico implements a Custom Resource Definition (CRD) object called RemoteCluster that is deployed on a local cluster to allow it to reference and collect endpoints from a remote cluster. In a 2-cluster scenario for example, this object needs to be applied on both clusters where each cluster is able to find, authenticate and successfully full endpoints of the pods in the other (remote) cluster.

This consists of a few steps provided that the prerequisite of pod-to-pod routing is enabled and working between the clusters:

For the lab setup, this is a bringup bash script that automates the steps

Finally, once the configs are applied, the list of remote endpoints can be fetched and verified by the calicoq CLI tool where the endpoints are prefixed with the name of the RemoteCluster object designated for the remote cluster.

Ex: for a RemoteCluster object named calico-demo-remote-canadacentral, the remote redis pods endpoints list may look like: 

Workload endpoint calico-demo-remote-canadacentral/aks-nodepool1-86764462-vmss000000/k8s/redis.demo-clusterb-1/eth0
Workload endpoint calico-demo-remote-canadacentral/aks-nodepool1-86764462-vmss000001/k8s/redis.demo-clusterb-0/eth0
Workload endpoint calico-demo-remote-canadacentral/aks-nodepool1-86764462-vmss000002/k8s/redis.demo-clusterb-2/eth0

Deploying the client application: Google Online Boutique demo/Hipstershop

The Google Online Boutique application YAML configuration needs to be modified from the standard configuration in order to include the changes to the cartservice pods to point to the ‘testdb’ Redis database service.

Working title: redis namespace perspective

The service graph shows the associations between hipstershop and redis testdb svc

Working title: hipstershop namespace perspective

Accessing the Hipstershop frontend

The frontend-external service provides a public LoadBalancer IP to access the frontend. With everything working correctly, the app can be accessed and shows the online store.

~# kubectl get svc -n hipstershop | grep LoadBalancer
frontend-external       LoadBalancer   10.0.24.223   20.1.14.187   80:31464/TCP   19d

Items can be added to a cart, which makes use of the cartservice microservice/app, and purchased in the mock demo app.

For testing the cartservice state we want to keep items in the cart so we can see how the state of the application is affected when it cannot query the Redis database service anymore.

Inducing a failure scenario and making Redis unavailable in one cluster

For seeing the effect of ‘breaking’ the Redis service on the application, we can put Redis into a ‘recovery mode’ to induce failure and see the frontend break where the ‘cartservice’ is unable to reach Redis.

This is because at this point, we have not actually created a federated service or made any config changes to the testdb service. Let’s go ahead and do that now.

Federating the Redis database service

As we already have federated endpoints and both clusters are aware of each other’s remote endpoints, we can move to the step of actually federating the Redis testdb service that was initially created.

Service federation requires the following prerequisites/considerations:

  • Since a federated service is a set of services with consolidated endpoints, it looks like a regular K8s Service but instead of using a pod selector, it uses an annotation which must be applied to the service/s it is backing.
  • Only services in the same namespace as the federated service are included. This implies namespace names across clusters are linked (this is a basic premise of federated endpoint identity).

First we need to label the testdb with an annotation service in all our clusters so that the Tigera controller can federate them. Do this in each cluster:

kubectl label svc -n redis testdb federation=yes

Then we apply the federated svc YAML config containing the testdb svc name to be federated as well as the required ‘special’ annotation federation.www.tigera.io/serviceSelector with the previously setup svc annotation that tells the Tigera Federated Services controller to target the service:

apiVersion: v1
kind: Service
metadata:
  name: testdb-federated
  namespace: redis
  annotations:
    federation.www.tigera.io/serviceSelector: federation == "yes"
spec:
  ports:
    - name: redis
      port: 11069
      protocol: TCP
  type: ClusterIP

Once the service is created and federated, we can now see that the testdb service has been populated with pod endpoints of the service active on the other remote cluster even while it’s local testdb service has no endpoints as the Redis database/svc has been taken down.

On westus, with the local endpoint of testdb being empty due to the Redis svc being taken down, there is no local testdb endpoint, but the federated testdb-federated still has the remote cluster pod’s endpoint available from the canadacentral cluster.

# kubectl get endpoints -n redis | grep testdb                                    
testdb               <none>             19d
testdb-federated     10.1.0.131:11069   18d

The federated endpoints look like:

kubectl get endpoints testdb-federated -n redis -oyaml

apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    federation.www.tigera.io/serviceSelector: federation == "yes"
  creationTimestamp: "2023-04-06T16:12:39Z"
  name: testdb-federated
  namespace: redis
  resourceVersion: "2901443"
  uid: 7414e1d3-594e-441d-ba7c-669c530cefd0
subsets:
- addresses:
  - ip: 10.1.0.131
    nodeName: aks-nodepool1-20240168-vmss000004
    targetRef:
      kind: Pod
      name: calico-demo-remote-canadacentral/demo-clusterb-2
      namespace: redis
      resourceVersion: "2899484"
      uid: f1e3630d-fffa-4475-8893-d6f1aacb74f6
  ports:
  - name: redis
    port: 11069
    protocol: TCP

Here we can see that the remote cluster calico-demo-remote-canadacentral is still advertising its pod endpoint with working service to the federated service.

Finally, the cartservice needs to leverage the new federated-testdb service which can be changed in its deployment’s env variable REDIS_ADDR to testdb-federated

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "4"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"cartservice","namespace":"hipstershop"},"spec":{"selector":{"matchLabels":{"app":"cartservice"}},"template":{"metadata":{"labels":{"app":"cartservice"}},"spec":{"containers":[{"env":[{"name":"REDIS_ADDR","value":"testdb.redis:11069"}],"image":"gcr.io/google-samples/microservices-demo/cartservice:v0.5.1","livenessProbe":{"exec":{"command":["/bin/grpc_health_probe","-addr=:7070","-rpc-timeout=5s"]},"initialDelaySeconds":15,"periodSeconds":10},"name":"server","ports":[{"containerPort":7070}],"readinessProbe":{"exec":{"command":["/bin/grpc_health_probe","-addr=:7070","-rpc-timeout=5s"]},"initialDelaySeconds":15},"resources":{"limits":{"cpu":"300m","memory":"128Mi"},"requests":{"cpu":"200m","memory":"64Mi"}}}],"serviceAccountName":"default","terminationGracePeriodSeconds":5}}}}
  creationTimestamp: "2023-04-05T20:08:16Z"
  generation: 4
  name: cartservice
  namespace: hipstershop
  resourceVersion: "2753897"
  uid: 8abc3df1-8f68-4301-9875-20fa76535461
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: cartservice
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: cartservice
    spec:
      containers:
      - env:
        - name: REDIS_ADDR
          value: testdb-federated.redis:11069
        image: gcr.io/google-samples/microservices-demo/cartservice:v0.5.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - /bin/grpc_health_probe
            - -addr=:7070
            - -rpc-timeout=5s
          failureThreshold: 3
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: server
        ports:
        - containerPort: 7070
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /bin/grpc_health_probe
            - -addr=:7070
            - -rpc-timeout=5s
          failureThreshold: 3
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 300m
            memory: 128Mi
          requests:
            cpu: 200m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 5
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-04-05T20:08:16Z"
    lastUpdateTime: "2023-04-06T16:14:12Z"
    message: ReplicaSet "cartservice-659f98f64f" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-04-24T17:18:00Z"
    lastUpdateTime: "2023-04-24T17:18:00Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 4
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Once this is done, the cartservice pods will update and once again make the Hipstershop web page accessible by using the remote cluster’s Redis federated service even if the local svc is unavailable thus ensuring HA.

Now when the web page is refreshed, the app cart page should come back up and retain its state with the items in the cart, showing that the Redis database entries were replicated properly across the other cluster and retrieved.

Conclusion

Keeping a critical database service like Redis highly available for microservices to consume in a multi-region, multi-cluster configuration can be challenging with Kubernetes’ native single-cluster architecture model. Calico cluster-mesh can address this use-case by federating pod endpoints and services for real-time databases and other similar applications.

Want to learn more? Get started with a free Calico Cloud trial.

Join our mailing list

Get updates on blog posts, workshops, certification programs, new releases, and more!

X