Skip to content
This repository was archived by the owner on May 8, 2025. It is now read-only.
This repository was archived by the owner on May 8, 2025. It is now read-only.

FlinkOperator crashes when deploying a new Job in a FlinkCluster #408

Open
@morelina

Description

@morelina

I am trying to update a running Job in my Flink Job Cluster. I am using commit which is still in PR with some fixes: 72e89b2

FlinkOperator triggers the Savepoint and it is successfully created. However, flink-operator crashes immediately after.

These are the events in FlinkCluster:
Normal SavepointCreated 17m FlinkOperator Successfully savepoint created
Normal SavepointTriggered 12m FlinkOperator Triggered savepoint for update: triggerID 590c343c5e3934e4996e5904b719cf17.
Normal SavepointCreated 12m FlinkOperator Successfully savepoint created
Normal SavepointTriggered 7m1s FlinkOperator Triggered savepoint for update: triggerID 51660f2c77254db025d37e23c0fa57e7.
Normal SavepointCreated 6m56s FlinkOperator Successfully savepoint created
Normal SavepointTriggered 107s FlinkOperator Triggered savepoint for update: triggerID ffc1ff58f0ee872a278fab5b

And these are the logs from the crash:

controllers.FlinkCluster ---------- 4. Take actions ---------- {"cluster": "namespace-a/cluster-a"}
controllers.FlinkCluster ConfigMap already exists, no action {"cluster": "namespace-a/cluster-a"}
controllers.FlinkCluster Statefulset already exists, no action {"cluster": "namespace-a/cluster-a", "component": "JobManager"}
controllers.FlinkCluster JobManager service already exists, no action {"cluster": "namespace-a/cluster-a"}
controllers.FlinkCluster Statefulset already exists, no action {"cluster": "namespace-a/cluster-a", "component": "TaskManager"}
controllers.FlinkCluster Job is about to be restarted to update {"cluster": "namespace-a/cluster-a"}
E0208 17:15:08.662342 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 362 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1422fc0, 0x2241f50)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1422fc0, 0x2241f50)
/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/googlecloudplatform/flink-operator/controllers.(*ClusterReconciler).reconcileJob(0xc001cad4a0, 0x15c2f00, 0x0, 0x0, 0x0)
/workspace/controllers/flinkcluster_reconciler.go:511 +0x5fb
github.com/googlecloudplatform/flink-operator/controllers.(*ClusterReconciler).reconcile(0xc001cad4a0, 0xc001cad4a0, 0x25, 0x0, 0x0)
/workspace/controllers/flinkcluster_reconciler.go:111 +0x223
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterHandler).reconcile(0xc000d55b28, 0xc0012a7350, 0x10, 0xc00127db20, 0x1d, 0xc0013e6800, 0x36e54d7482f9f143, 0x13b94c0, 0xc0013e6770)
/workspace/controllers/flinkcluster_controller.go:220 +0xb91
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterReconciler).Reconcile(0xc0007165a0, 0xc0012a7350, 0x10, 0xc00127db20, 0x1d, 0x0, 0xc0007a47273aff35, 0xc000724360, 0xc000724128)
/workspace/controllers/flinkcluster_controller.go:82 +0x249
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000c89c0, 0x1475780, 0xc0013e6760, 0x0)
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x161
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000c89c0, 0xc00051ee00)
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xae
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0000c89c0)
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00007a950)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00007a950, 0x17b1ee0, 0xc0004362d0, 0x1668101, 0xc000114360)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00007a950, 0x3b9aca00, 0x0, 0x1, 0xc000114360)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc00007a950, 0x3b9aca00, 0xc000114360)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x305
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12eee3b]

goroutine 362 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x105
panic(0x1422fc0, 0x2241f50)
/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/googlecloudplatform/flink-operator/controllers.(*ClusterReconciler).reconcileJob(0xc001cad4a0, 0x15c2f00, 0x0, 0x0, 0x0)
/workspace/controllers/flinkcluster_reconciler.go:511 +0x5fb
github.com/googlecloudplatform/flink-operator/controllers.(*ClusterReconciler).reconcile(0xc001cad4a0, 0xc001cad4a0, 0x25, 0x0, 0x0)
/workspace/controllers/flinkcluster_reconciler.go:111 +0x223
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterHandler).reconcile(0xc000d55b28, 0xc0012a7350, 0x10, 0xc00127db20, 0x1d, 0xc0013e6800, 0x36e54d7482f9f143, 0x13b94c0, 0xc0013e6770)
/workspace/controllers/flinkcluster_controller.go:220 +0xb91
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterReconciler).Reconcile(0xc0007165a0, 0xc0012a7350, 0x10, 0xc00127db20, 0x1d, 0x0, 0xc0007a47273aff35, 0xc000724360, 0xc000724128)
/workspace/controllers/flinkcluster_controller.go:82 +0x249
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000c89c0, 0x1475780, 0xc0013e6760, 0x0)
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x161
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000c89c0, 0xc00051ee00)
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xae
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0000c89c0)
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00007a950)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00007a950, 0x17b1ee0, 0xc0004362d0, 0x1668101, 0xc000114360)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00007a950, 0x3b9aca00, 0x0, 0x1, 0xc000114360)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc00007a950, 0x3b9aca00, 0xc000114360)
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x305

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions