Description
Bug Report
What did you do?
We are using a simple label selector based informer in the Flink Kubernetes Operator: https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventSourceUtils.java#L45
It happened in some cases that after a while, the informer could not find the target object (Deployment) anymore, while it definitely existed in Kubernetes (verified manually). Restarting the operator solved the problem.
Based on this we suspect that the informer simply stopped receiving new events after a while and never recovered.
Environment
Josdk: 4.1.1
Java 11
Activity
csviri commentedon Jan 19, 2023
So if I understand correctly, it was before. It is not the case, that it never received the resource in the informer.
Checked but this part of the code is very simple on our side, basically just reading, reading the resource from the informer cache.
But will add some logging to make sure that it can be made sure its not in JOSDK.
@manusa @shawkins haven't you encountered this problem before?
shawkins commentedon Jan 19, 2023
This likely was capture as fabric8io/kubernetes-client#4781 as well. We can work from the upstream side first based upon the comment over there.
csviri commentedon Jan 19, 2023
I discussed with @gyfora before, this seems to be a different issue. TBH I can't imaging how a resource is removed from the cache (ItemStore) without a delete event. But yep we can let's continue on fabric8 client side.
shawkins commentedon Jan 19, 2023
It shouldn't be possible for it to have existed, then not exist without emitting a delete event - at least at the informer level. The only circumstances where an entry are removed are a delete event from the watch, and on a relist (which should be rare in an environment where bookmarks are supported) where the item no longer exists. Is it possible that the item was known / cached by the operator sdk and was never populated in the informer cache to begin with?
csviri commentedon Jan 19, 2023
No that is not possible. JOSDK reads the Informer cache for resources. There is an another layer, mapping the resource between primary custom resource and secondary resource in this case (this is where I added logging). But if that was found before, also not possible.
csviri commentedon Jan 24, 2023
I think we will need logs for this, i was not able to think about any scenarios where this could happen.
github-actions commentedon Apr 5, 2023
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.
2 remaining items