Skip to content

etcd crashes in EKS cluster  #5

@jishaashokan

Description

@jishaashokan

Following your article

helm search repo bitnami  | grep etcd
bitnami/etcd                                	8.5.11       	3.5.6        	etcd is a distributed key-value store designed ...

I found the helm chart 8.5.11 provides etcd version 3.5.6. I upgraded my existing apisix installation by updating the version in charts.yaml :

helm dependency list ./charts/apisix
NAME                     	VERSION	REPOSITORY                        	STATUS
etcd                     	8.5.11 	https://charts.bitnami.com/bitnami	ok    
apisix-dashboard         	0.6.1  	https://charts.apiseven.com/       	ok    
apisix-ingress-controller	0.11.1 	https://charts.apiseven.com/       	ok

helm upgrade apisix ./charts/apisix --set gateway.type=LoadBalancer --set allow.ipList="{0.0.0.0/0}" --set ingress-controller.enabled=true --namespace ingress-apisix --set ingress-controller.config.apisix.serviceNamespace=ingress-apisix --set gateway.tls.enabled=true --set ingress-controller.config.apisix.adminKey=x --set admin.credentials.admin=xxxxx --set xxxx admin.credentials.viewer=xxxxx --set ingressController.config.apisix.baseURL=http://apisix-admin:9180/apisix/admin --set dashboard.enabled=true

However, etcd still crashes :

mk logs -f apisix-etcd-0
etcd 04:02:06.39 
etcd 04:02:06.39 Welcome to the Bitnami etcd container
etcd 04:02:06.39 Subscribe to project updates by watching https://github.com/bitnami/containers
etcd 04:02:06.39 Submit issues and feature requests at https://github.com/bitnami/containers/issues
etcd 04:02:06.39 
etcd 04:02:06.39 INFO  ==> ** Starting etcd setup **
etcd 04:02:06.41 INFO  ==> Validating settings in ETCD_* env vars..
etcd 04:02:06.41 WARN  ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 04:02:06.41 INFO  ==> Initializing etcd
etcd 04:02:06.41 INFO  ==> Generating etcd config file using env variables
etcd 04:02:06.43 INFO  ==> There is no data from previous deployments
etcd 04:02:06.44 INFO  ==> Adding new member to existing cluster
etcd 04:02:16.59 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:02:36.68 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:02:56.76 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:16.84 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:36.91 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:57.00 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:17.08 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:37.15 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:57.27 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:17.33 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:37.43 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:57.53 WARN  ==> Cluster not healthy, not adding self to cluster for now, keeping trying...

These are events from the kubernetes cluster :

21m         Warning   FailedPreStopHook                 pod/apisix-etcd-0                              Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(a9c0fe68-6cec-4934-9934-43678e34977f)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 137: , message: ""
21m         Warning   FailedPreStopHook                 pod/apisix-etcd-1                              Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(0e9be7b4-c34a-4992-97cf-a8426766534a)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 137: , message: ""
20m         Warning   FailedMount                       pod/apisix-etcd-2                              Unable to attach or mount volumes: unmounted volumes=[data etcd-jwt-token kube-api-access-jtccd], unattached volumes=[data etcd-jwt-token kube-api-access-jtccd]: timed out waiting for the condition
20m         Normal    NoPods                            poddisruptionbudget/apisix-etcd                No matching pods found
20m         Normal    SuccessfulCreate                  statefulset/apisix-etcd                        create Claim data-apisix-etcd-1 Pod apisix-etcd-1 in StatefulSet apisix-etcd success
20m         Normal    WaitForFirstConsumer              persistentvolumeclaim/data-apisix-etcd-0       waiting for first consumer to be created before binding
20m         Normal    WaitForFirstConsumer              persistentvolumeclaim/data-apisix-etcd-1       waiting for first consumer to be created before binding
20m         Normal    SuccessfulCreate                  statefulset/apisix-etcd                        create Pod apisix-etcd-1 in StatefulSet apisix-etcd successful
20m         Normal    SuccessfulCreate                  statefulset/apisix-etcd                        create Pod apisix-etcd-0 in StatefulSet apisix-etcd successful
20m         Normal    SuccessfulCreate                  statefulset/apisix-etcd                        create Claim data-apisix-etcd-0 Pod apisix-etcd-0 in StatefulSet apisix-etcd success
20m         Normal    SuccessfulCreate                  statefulset/apisix-etcd                        create Pod apisix-etcd-2 in StatefulSet apisix-etcd successful
20m         Normal    WaitForFirstConsumer              persistentvolumeclaim/data-apisix-etcd-2       waiting for first consumer to be created before binding
20m         Normal    SuccessfulCreate                  statefulset/apisix-etcd                        create Claim data-apisix-etcd-2 Pod apisix-etcd-2 in StatefulSet apisix-etcd success
20m         Normal    ProvisioningSucceeded             persistentvolumeclaim/data-apisix-etcd-0       Successfully provisioned volume pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7 using kubernetes.io/aws-ebs
20m         Normal    ProvisioningSucceeded             persistentvolumeclaim/data-apisix-etcd-1       Successfully provisioned volume pvc-fbc18bc2-7e9a-4a2e-9311-bbd446a846de using kubernetes.io/aws-ebs
20m         Normal    ProvisioningSucceeded             persistentvolumeclaim/data-apisix-etcd-2       Successfully provisioned volume pvc-3537fab5-57c6-4386-9d4a-3f623b2b4db3 using kubernetes.io/aws-ebs
20m         Normal    Scheduled                         pod/apisix-etcd-0                              Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal
20m         Normal    Scheduled                         pod/apisix-etcd-1                              Successfully assigned ingress-apisix/apisix-etcd-1 to ip-172-31-118-166.ap-south-1.compute.internal
20m         Normal    Scheduled                         pod/apisix-etcd-2                              Successfully assigned ingress-apisix/apisix-etcd-2 to ip-172-31-102-32.ap-south-1.compute.internal
20m         Normal    SuccessfulAttachVolume            pod/apisix-etcd-2                              AttachVolume.Attach succeeded for volume "pvc-3537fab5-57c6-4386-9d4a-3f623b2b4db3"
20m         Normal    SuccessfulAttachVolume            pod/apisix-etcd-0                              AttachVolume.Attach succeeded for volume "pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7"
20m         Normal    SuccessfulAttachVolume            pod/apisix-etcd-1                              AttachVolume.Attach succeeded for volume "pvc-fbc18bc2-7e9a-4a2e-9311-bbd446a846de"
20m         Normal    Started                           pod/apisix-etcd-0                              Started container etcd
20m         Normal    Started                           pod/apisix-etcd-1                              Started container etcd
20m         Normal    Pulled                            pod/apisix-etcd-1                              Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
20m         Normal    Created                           pod/apisix-etcd-1                              Created container etcd
20m         Normal    Pulled                            pod/apisix-etcd-0                              Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
20m         Normal    Created                           pod/apisix-etcd-0                              Created container etcd
20m         Normal    Pulled                            pod/apisix-etcd-2                              Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
20m         Normal    Created                           pod/apisix-etcd-2                              Created container etcd
20m         Normal    Started                           pod/apisix-etcd-2                              Started container etcd
5m1s        Warning   Unhealthy                         pod/apisix-etcd-2                              Readiness probe failed:
5m2s        Warning   Unhealthy                         pod/apisix-etcd-0                              Readiness probe failed:
5m1s        Warning   Unhealthy                         pod/apisix-etcd-1                              Readiness probe failed:
16m         Warning   Unhealthy                         pod/apisix-etcd-2                              Liveness probe failed:
16m         Warning   Unhealthy                         pod/apisix-etcd-1                              Liveness probe failed:
16m         Warning   Unhealthy                         pod/apisix-etcd-0                              Liveness probe failed:
16m         Warning   FailedPreStopHook                 pod/apisix-etcd-0                              Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m         Normal    Killing                           pod/apisix-etcd-2                              Container etcd failed liveness probe, will be restarted
16m         Warning   FailedPreStopHook                 pod/apisix-etcd-2                              Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-2_ingress-apisix(9ee4b468-5674-4dfb-8bf4-dca0b964bbe0)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m         Warning   FailedPreStopHook                 pod/apisix-etcd-1                              Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(6fb9dc93-aabc-4f3a-99d3-6ba10bf3e040)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m         Normal    Killing                           pod/apisix-etcd-1                              Container etcd failed liveness probe, will be restarted
16m         Normal    Killing                           pod/apisix-etcd-0                              Container etcd failed liveness probe, will be restarted

Pod status :


apisix-etcd-0                                0/1     Running   5 (2m53s ago)   23m
apisix-etcd-1                                0/1     Running   5 (2m53s ago)   23m
apisix-etcd-2                                0/1     Running   5 (2m53s ago)   23m

 mk describe pod/apisix-etcd-0
Name:         apisix-etcd-0
Namespace:    ingress-apisix
Priority:     0
Node:         ip-172-31-110-110.ap-south-1.compute.internal/172.31.110.110
Start Time:   Tue, 10 Jan 2023 09:28:00 +0530
Labels:       app.kubernetes.io/instance=apisix
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=etcd
              controller-revision-hash=apisix-etcd-648486db84
              helm.sh/chart=etcd-8.5.11
              statefulset.kubernetes.io/pod-name=apisix-etcd-0
Annotations:  checksum/token-secret: f0fcd4104dce3cb310d3f003076edf43dc81011716f2cdcc405202be9ceb3434
              kubernetes.io/psp: eks.privileged
Status:       Running
IP:           172.31.110.254
IPs:
  IP:           172.31.110.254
Controlled By:  StatefulSet/apisix-etcd
Containers:
  etcd:
    Container ID:   docker://937cac1eaa863b75423d0f2ecf21f0e22a9dd9cbe9cd1f6ea708bda9606ade57
    Image:          docker.io/bitnami/etcd:3.5.6-debian-11-r10
    Image ID:       docker-pullable://bitnami/etcd@sha256:2d7b831769734bb97a5c1cfd2fe46e29f422b70b5ba9f9aedfd91300839ac3ee
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 10 Jan 2023 09:52:06 +0530
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 10 Jan 2023 09:48:06 +0530
      Finished:     Tue, 10 Jan 2023 09:52:06 +0530
    Ready:          False
    Restart Count:  6
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-etcd-0 (v1:metadata.name)
      MY_STS_NAME:                       apisix-etcd
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_AUTH_TOKEN:                   jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-etcd-headless.ingress-apisix.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h9wdm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-etcd-0
    ReadOnly:   false
  etcd-jwt-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apisix-etcd-jwt-token
    Optional:    false
  kube-api-access-h9wdm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               25m                default-scheduler        Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal
  Normal   SuccessfulAttachVolume  24m                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7"
  Normal   Pulled                  24m                kubelet                  Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
  Normal   Created                 24m                kubelet                  Created container etcd
  Normal   Started                 24m                kubelet                  Started container etcd
  Warning  Unhealthy               21m (x5 over 23m)  kubelet                  Liveness probe failed:
  Normal   Killing                 21m                kubelet                  Container etcd failed liveness probe, will be restarted
  Warning  FailedPreStopHook       21m                kubelet                  Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex
, message: "Error: bad member ID arg (strconv.ParseUint: parsing \"\": invalid syntax), expecting ID in Hex\n"
  Warning  Unhealthy  3m45s (x91 over 23m)  kubelet  Readiness probe failed:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions