Skip to content

Conversation

@EronWright
Copy link
Contributor

@EronWright EronWright commented Mar 12, 2025

Proposed changes

This PR fixes a race condition in the await logic that leads to Pulumi stalling until the await timeout is reached (e.g. after 600s). The condition logic races between informer events and GET call(s) to observe the deletion of the object. In the edge case that the 'deleted' watch event is observed before the GET call completes, the system may become stuck because the getter overwrites the watcher's observation (as stored in the cond.deleted field).

In other words the deleted flag should be idempotent, and never transition from true to false.

Also, I notice that the informer could be more reliably started by the factory, since factory.Start is idempotent. This will ensure that the informer is considered during shutdown, and that WaitForCacheSync would work as expected.

Related issues (optional)

closes #3317

@EronWright EronWright requested a review from a team March 12, 2025 16:27
@EronWright EronWright self-assigned this Mar 12, 2025
@EronWright EronWright requested a review from rquitales March 12, 2025 16:28
@github-actions
Copy link

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

@codecov
Copy link

codecov bot commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 41.13%. Comparing base (93f8a5e) to head (862e652).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3550   +/-   ##
=======================================
  Coverage   41.13%   41.13%           
=======================================
  Files          87       87           
  Lines       12910    12906    -4     
=======================================
- Hits         5310     5309    -1     
+ Misses       7205     7203    -2     
+ Partials      395      394    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@EronWright EronWright requested a review from blampe March 13, 2025 01:26
factory.WaitForCacheSync(des.stopper)
// Start the new informer by calling factory.Start (which is idempotent).
// This ensures that the informer is started exactly once and is cleaned up later.
factory.Start(des.stopper)
Copy link
Contributor Author

@EronWright EronWright Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See implementation of factory.Start and factory.WaitForCacheSync. Note that the informer was created earlier by the factory (via ForResource).

close(stopper)
factory.Shutdown()
}()
factory.Start(stopper)
Copy link
Contributor Author

@EronWright EronWright Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale: There are no informers to start at this point, so this is a no-op. See factory.Start.

_, err := dc.getter.Get(ctx, dc.Object().GetName(), metav1.GetOptions{})
if err == nil {
// Still exists.
dc.deleted.Store(false)
Copy link
Contributor Author

@EronWright EronWright Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale: the deleted field should transition from false to true, never true to false. This line of code was assumedly thought to be a no-op. In practice it overwrites the result from the watcher goroutine.

@EronWright EronWright enabled auto-merge (squash) March 14, 2025 02:48
@EronWright EronWright merged commit 8e82519 into master Mar 14, 2025
19 checks passed
@EronWright EronWright deleted the issue-3317 branch March 14, 2025 03:20
@pulumi-bot
Copy link
Contributor

This PR has been shipped in release v4.22.1.

@pulumi-bot
Copy link
Contributor

This PR has been shipped in release v4.22.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Await logic for deletion is sometimes slow

3 participants