Skip to content

fix(livestore): stop periodic WAL flush goroutines before shutdown flush#6972

Merged
zhxiaogg merged 4 commits intografana:mainfrom
zhxiaogg:worktree-fix-concurrent-wal-flush-on-shutdown
Apr 15, 2026
Merged

fix(livestore): stop periodic WAL flush goroutines before shutdown flush#6972
zhxiaogg merged 4 commits intografana:mainfrom
zhxiaogg:worktree-fix-concurrent-wal-flush-on-shutdown

Conversation

@zhxiaogg
Copy link
Copy Markdown
Contributor

@zhxiaogg zhxiaogg commented Apr 15, 2026

What this PR does:

  • added a mechanism to stop the periodic WAL flushes before shutdown flush, to avoid potential concurrent writes to the same WAL

The startup/shutdown procedure now:

  1. during startup, livestore will create tenant instances at startup, with a go routine for the periodic WAL flush. For the periodic flush:
    1. it will not start until livestore is fully initialized
    2. it will stop when get signal from cutToWalStop channel
  2. during shutdown, before livestore trying to flush all livetraces to WAL, it will signal all periodic WAL flush loops to stop via cutToWalStop
    1. this guarantees no concurrent writes to WALs

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copilot AI review requested due to automatic review settings April 15, 2026 14:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent concurrent writes to the same WAL during shutdown by stopping the per-tenant periodic “cut to WAL” loops before performing the final shutdown flush.

Changes:

  • Added a dedicated stop channel and waitgroup to stop/wait for per-tenant periodic WAL flush goroutines during shutdown.
  • Updated perTenantCutToWalLoop to stop on the new stop signal rather than the service context.
  • Adjusted tests to stop the service via services.StopAndAwaitTerminated(...) instead of calling liveStore.stopping(nil) directly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
modules/livestore/live_store.go Adds shutdown coordination (cutToWalStop, cutToWalWg) and stops periodic WAL flush goroutines before shutdown flush.
modules/livestore/live_store_background.go Hooks per-tenant periodic WAL flush loop into the new stop mechanism.
modules/livestore/live_store_test.go Updates stopping-related tests to use the service stop path.

Comment thread modules/livestore/live_store_background.go
Comment thread modules/livestore/live_store_background.go
Comment thread modules/livestore/live_store_test.go
Comment thread modules/livestore/live_store_test.go
Comment thread modules/livestore/live_store_test.go
Comment thread modules/livestore/live_store_test.go
Comment thread modules/livestore/live_store.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread modules/livestore/live_store_background.go
Comment thread modules/livestore/live_store_test.go
case <-ticker.C:
s.cutOneInstanceToWal(s.ctx, instance, false)
case <-s.ctx.Done():
case <-s.startupComplete:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment that we will continue if startupComplete is met first we will continue. Copilot also mentions there is not a context check here, thats probably implicit in the other two but if we an either A pass it in, or document the behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment and PR description for the livestore startup/shutdown process around WAL completion.

Copy link
Copy Markdown
Contributor

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice fix

Copy link
Copy Markdown
Contributor

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@zhxiaogg zhxiaogg merged commit 970e084 into grafana:main Apr 15, 2026
44 of 45 checks passed
@zhxiaogg zhxiaogg deleted the worktree-fix-concurrent-wal-flush-on-shutdown branch April 15, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants