-
Notifications
You must be signed in to change notification settings - Fork 7
etcd crashes in EKS cluster #5
Description
Following your article
helm search repo bitnami | grep etcd
bitnami/etcd 8.5.11 3.5.6 etcd is a distributed key-value store designed ...
I found the helm chart 8.5.11 provides etcd version 3.5.6. I upgraded my existing apisix installation by updating the version in charts.yaml :
helm dependency list ./charts/apisix
NAME VERSION REPOSITORY STATUS
etcd 8.5.11 https://charts.bitnami.com/bitnami ok
apisix-dashboard 0.6.1 https://charts.apiseven.com/ ok
apisix-ingress-controller 0.11.1 https://charts.apiseven.com/ ok
helm upgrade apisix ./charts/apisix --set gateway.type=LoadBalancer --set allow.ipList="{0.0.0.0/0}" --set ingress-controller.enabled=true --namespace ingress-apisix --set ingress-controller.config.apisix.serviceNamespace=ingress-apisix --set gateway.tls.enabled=true --set ingress-controller.config.apisix.adminKey=x --set admin.credentials.admin=xxxxx --set xxxx admin.credentials.viewer=xxxxx --set ingressController.config.apisix.baseURL=http://apisix-admin:9180/apisix/admin --set dashboard.enabled=true
However, etcd still crashes :
mk logs -f apisix-etcd-0
etcd 04:02:06.39
etcd 04:02:06.39 Welcome to the Bitnami etcd container
etcd 04:02:06.39 Subscribe to project updates by watching https://github.com/bitnami/containers
etcd 04:02:06.39 Submit issues and feature requests at https://github.com/bitnami/containers/issues
etcd 04:02:06.39
etcd 04:02:06.39 INFO ==> ** Starting etcd setup **
etcd 04:02:06.41 INFO ==> Validating settings in ETCD_* env vars..
etcd 04:02:06.41 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
etcd 04:02:06.41 INFO ==> Initializing etcd
etcd 04:02:06.41 INFO ==> Generating etcd config file using env variables
etcd 04:02:06.43 INFO ==> There is no data from previous deployments
etcd 04:02:06.44 INFO ==> Adding new member to existing cluster
etcd 04:02:16.59 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:02:36.68 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:02:56.76 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:16.84 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:36.91 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:03:57.00 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:17.08 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:37.15 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:04:57.27 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:17.33 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:37.43 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
etcd 04:05:57.53 WARN ==> Cluster not healthy, not adding self to cluster for now, keeping trying...
These are events from the kubernetes cluster :
21m Warning FailedPreStopHook pod/apisix-etcd-0 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(a9c0fe68-6cec-4934-9934-43678e34977f)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 137: , message: ""
21m Warning FailedPreStopHook pod/apisix-etcd-1 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(0e9be7b4-c34a-4992-97cf-a8426766534a)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 137: , message: ""
20m Warning FailedMount pod/apisix-etcd-2 Unable to attach or mount volumes: unmounted volumes=[data etcd-jwt-token kube-api-access-jtccd], unattached volumes=[data etcd-jwt-token kube-api-access-jtccd]: timed out waiting for the condition
20m Normal NoPods poddisruptionbudget/apisix-etcd No matching pods found
20m Normal SuccessfulCreate statefulset/apisix-etcd create Claim data-apisix-etcd-1 Pod apisix-etcd-1 in StatefulSet apisix-etcd success
20m Normal WaitForFirstConsumer persistentvolumeclaim/data-apisix-etcd-0 waiting for first consumer to be created before binding
20m Normal WaitForFirstConsumer persistentvolumeclaim/data-apisix-etcd-1 waiting for first consumer to be created before binding
20m Normal SuccessfulCreate statefulset/apisix-etcd create Pod apisix-etcd-1 in StatefulSet apisix-etcd successful
20m Normal SuccessfulCreate statefulset/apisix-etcd create Pod apisix-etcd-0 in StatefulSet apisix-etcd successful
20m Normal SuccessfulCreate statefulset/apisix-etcd create Claim data-apisix-etcd-0 Pod apisix-etcd-0 in StatefulSet apisix-etcd success
20m Normal SuccessfulCreate statefulset/apisix-etcd create Pod apisix-etcd-2 in StatefulSet apisix-etcd successful
20m Normal WaitForFirstConsumer persistentvolumeclaim/data-apisix-etcd-2 waiting for first consumer to be created before binding
20m Normal SuccessfulCreate statefulset/apisix-etcd create Claim data-apisix-etcd-2 Pod apisix-etcd-2 in StatefulSet apisix-etcd success
20m Normal ProvisioningSucceeded persistentvolumeclaim/data-apisix-etcd-0 Successfully provisioned volume pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7 using kubernetes.io/aws-ebs
20m Normal ProvisioningSucceeded persistentvolumeclaim/data-apisix-etcd-1 Successfully provisioned volume pvc-fbc18bc2-7e9a-4a2e-9311-bbd446a846de using kubernetes.io/aws-ebs
20m Normal ProvisioningSucceeded persistentvolumeclaim/data-apisix-etcd-2 Successfully provisioned volume pvc-3537fab5-57c6-4386-9d4a-3f623b2b4db3 using kubernetes.io/aws-ebs
20m Normal Scheduled pod/apisix-etcd-0 Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal
20m Normal Scheduled pod/apisix-etcd-1 Successfully assigned ingress-apisix/apisix-etcd-1 to ip-172-31-118-166.ap-south-1.compute.internal
20m Normal Scheduled pod/apisix-etcd-2 Successfully assigned ingress-apisix/apisix-etcd-2 to ip-172-31-102-32.ap-south-1.compute.internal
20m Normal SuccessfulAttachVolume pod/apisix-etcd-2 AttachVolume.Attach succeeded for volume "pvc-3537fab5-57c6-4386-9d4a-3f623b2b4db3"
20m Normal SuccessfulAttachVolume pod/apisix-etcd-0 AttachVolume.Attach succeeded for volume "pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7"
20m Normal SuccessfulAttachVolume pod/apisix-etcd-1 AttachVolume.Attach succeeded for volume "pvc-fbc18bc2-7e9a-4a2e-9311-bbd446a846de"
20m Normal Started pod/apisix-etcd-0 Started container etcd
20m Normal Started pod/apisix-etcd-1 Started container etcd
20m Normal Pulled pod/apisix-etcd-1 Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
20m Normal Created pod/apisix-etcd-1 Created container etcd
20m Normal Pulled pod/apisix-etcd-0 Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
20m Normal Created pod/apisix-etcd-0 Created container etcd
20m Normal Pulled pod/apisix-etcd-2 Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
20m Normal Created pod/apisix-etcd-2 Created container etcd
20m Normal Started pod/apisix-etcd-2 Started container etcd
5m1s Warning Unhealthy pod/apisix-etcd-2 Readiness probe failed:
5m2s Warning Unhealthy pod/apisix-etcd-0 Readiness probe failed:
5m1s Warning Unhealthy pod/apisix-etcd-1 Readiness probe failed:
16m Warning Unhealthy pod/apisix-etcd-2 Liveness probe failed:
16m Warning Unhealthy pod/apisix-etcd-1 Liveness probe failed:
16m Warning Unhealthy pod/apisix-etcd-0 Liveness probe failed:
16m Warning FailedPreStopHook pod/apisix-etcd-0 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m Normal Killing pod/apisix-etcd-2 Container etcd failed liveness probe, will be restarted
16m Warning FailedPreStopHook pod/apisix-etcd-2 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-2_ingress-apisix(9ee4b468-5674-4dfb-8bf4-dca0b964bbe0)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m Warning FailedPreStopHook pod/apisix-etcd-1 Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-1_ingress-apisix(6fb9dc93-aabc-4f3a-99d3-6ba10bf3e040)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex...
16m Normal Killing pod/apisix-etcd-1 Container etcd failed liveness probe, will be restarted
16m Normal Killing pod/apisix-etcd-0 Container etcd failed liveness probe, will be restarted
Pod status :
apisix-etcd-0 0/1 Running 5 (2m53s ago) 23m
apisix-etcd-1 0/1 Running 5 (2m53s ago) 23m
apisix-etcd-2 0/1 Running 5 (2m53s ago) 23m
mk describe pod/apisix-etcd-0
Name: apisix-etcd-0
Namespace: ingress-apisix
Priority: 0
Node: ip-172-31-110-110.ap-south-1.compute.internal/172.31.110.110
Start Time: Tue, 10 Jan 2023 09:28:00 +0530
Labels: app.kubernetes.io/instance=apisix
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=etcd
controller-revision-hash=apisix-etcd-648486db84
helm.sh/chart=etcd-8.5.11
statefulset.kubernetes.io/pod-name=apisix-etcd-0
Annotations: checksum/token-secret: f0fcd4104dce3cb310d3f003076edf43dc81011716f2cdcc405202be9ceb3434
kubernetes.io/psp: eks.privileged
Status: Running
IP: 172.31.110.254
IPs:
IP: 172.31.110.254
Controlled By: StatefulSet/apisix-etcd
Containers:
etcd:
Container ID: docker://937cac1eaa863b75423d0f2ecf21f0e22a9dd9cbe9cd1f6ea708bda9606ade57
Image: docker.io/bitnami/etcd:3.5.6-debian-11-r10
Image ID: docker-pullable://bitnami/etcd@sha256:2d7b831769734bb97a5c1cfd2fe46e29f422b70b5ba9f9aedfd91300839ac3ee
Ports: 2379/TCP, 2380/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Tue, 10 Jan 2023 09:52:06 +0530
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 10 Jan 2023 09:48:06 +0530
Finished: Tue, 10 Jan 2023 09:52:06 +0530
Ready: False
Restart Count: 6
Liveness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
Readiness: exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
Environment:
BITNAMI_DEBUG: false
MY_POD_IP: (v1:status.podIP)
MY_POD_NAME: apisix-etcd-0 (v1:metadata.name)
MY_STS_NAME: apisix-etcd
ETCDCTL_API: 3
ETCD_ON_K8S: yes
ETCD_START_FROM_SNAPSHOT: no
ETCD_DISASTER_RECOVERY: no
ETCD_NAME: $(MY_POD_NAME)
ETCD_DATA_DIR: /bitnami/etcd/data
ETCD_LOG_LEVEL: info
ALLOW_NONE_AUTHENTICATION: yes
ETCD_AUTH_TOKEN: jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
ETCD_ADVERTISE_CLIENT_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS: http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
ETCD_INITIAL_CLUSTER_TOKEN: etcd-cluster-k8s
ETCD_INITIAL_CLUSTER_STATE: existing
ETCD_INITIAL_CLUSTER: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
ETCD_CLUSTER_DOMAIN: apisix-etcd-headless.ingress-apisix.svc.cluster.local
Mounts:
/bitnami/etcd from data (rw)
/opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h9wdm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-apisix-etcd-0
ReadOnly: false
etcd-jwt-token:
Type: Secret (a volume populated by a Secret)
SecretName: apisix-etcd-jwt-token
Optional: false
kube-api-access-h9wdm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned ingress-apisix/apisix-etcd-0 to ip-172-31-110-110.ap-south-1.compute.internal
Normal SuccessfulAttachVolume 24m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6b3d1b5c-b1a6-4bc0-9ff2-32de868e4cc7"
Normal Pulled 24m kubelet Container image "docker.io/bitnami/etcd:3.5.6-debian-11-r10" already present on machine
Normal Created 24m kubelet Created container etcd
Normal Started 24m kubelet Started container etcd
Warning Unhealthy 21m (x5 over 23m) kubelet Liveness probe failed:
Normal Killing 21m kubelet Container etcd failed liveness probe, will be restarted
Warning FailedPreStopHook 21m kubelet Exec lifecycle hook ([/opt/bitnami/scripts/etcd/prestop.sh]) for Container "etcd" in Pod "apisix-etcd-0_ingress-apisix(3ed8842f-ec58-47d4-b315-ce8a79328578)" failed - error: command '/opt/bitnami/scripts/etcd/prestop.sh' exited with 128: Error: bad member ID arg (strconv.ParseUint: parsing "": invalid syntax), expecting ID in Hex
, message: "Error: bad member ID arg (strconv.ParseUint: parsing \"\": invalid syntax), expecting ID in Hex\n"
Warning Unhealthy 3m45s (x91 over 23m) kubelet Readiness probe failed: