Description
Adaptation unit tests fail, ever since c4893c7. The reason is that while that commit fixed a locking inversion bug, it now opened a time window for a plugin to lose events during registration after it has received the initial set of pods and containers but before it is has been fully registered for event processing.
Synchronizing the plugin and adding it to the slice of registered plugins both used to happen with Adaptation.Mutex
held. Now that mutex is held only after synchronization is done while updating the slice of plugins. As a consequence to the adaptation test suite while Suite.WaitForPluginsToSync()
used to return only once the plugin was fully registered (IOW, added to slice of plugins), now it returns before that. Therefore, it became possible for a pod/container event to come when a plugin is halfway through its registration: already synchronized with existing pods/containers, but not added yet to the slice of plugins. When that happens, the plugin loses any event that comes before it has been added to the slice of plugins.
The effect is not limited to unit tests, but it becomes apparent as unit tests banging full steam registering plugins and generating pod/container events now trigger this race condition reliably. Similar NRI pod/container event loss is also possible now in the runtimes.