Skip to content

[bugfix] correct lag calculation#6405

Merged
ruslan-mikhailov merged 5 commits intografana:mainfrom
ruslan-mikhailov:bugfix/correct-lag-calculation
Feb 9, 2026
Merged

[bugfix] correct lag calculation#6405
ruslan-mikhailov merged 5 commits intografana:mainfrom
ruslan-mikhailov:bugfix/correct-lag-calculation

Conversation

@ruslan-mikhailov
Copy link
Copy Markdown
Contributor

@ruslan-mikhailov ruslan-mikhailov commented Feb 6, 2026

What this PR does: fixes lag calculation. This PR contains couple of bugfixes:

  1. correct lag calculation as diff between HWM and latest committed offset.
    a. If there was an error on consumption right after the first iteration, we cannot get lag and set it to -1. Note: current code does not return an error, it is a no-op change.
    b. if no error and no recorded offset - means no new records, lag is zero.
  2. record lag only after consume function call
  3. If all records in a batch could not be consumed (e.g. record.Timestamp.Before(cutoff)), we still need to commit these changes. Currently, they are committed only if at least one message was pushed to inst.pushBytes.
  4. Wait for catch up only after all processes started. This fixes a deadlock when live-store is not yet started block processing, while it is already consuming and blocked by waitForCatchUp.

How it has been tested: unit tests (in the PR) and manual tests.
Manual test case.

  1. add long sleep on consumption to be sure live-store is lagging.
  2. run it for at least 10 minutes (ideally, 30+ mins)
  3. remove long sleep and enable readiness probe.

Expected result: it is running, but in non-ready state. After it catches up, it becomes ready.

Actual result matches expected results:
image

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/correct-lag-calculation branch 3 times, most recently from 539479b to b996093 Compare February 9, 2026 09:19
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/correct-lag-calculation branch from e5d3e68 to 8152072 Compare February 9, 2026 12:25
@oleg-kozlyuk-grafana
Copy link
Copy Markdown
Contributor

Excellent job, thank you for fixing these issues!

Comment thread CHANGELOG.md
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/correct-lag-calculation branch from 8152072 to e4849f9 Compare February 9, 2026 12:56
@ruslan-mikhailov ruslan-mikhailov enabled auto-merge (squash) February 9, 2026 12:58
@ruslan-mikhailov ruslan-mikhailov enabled auto-merge (squash) February 9, 2026 13:02
@ruslan-mikhailov ruslan-mikhailov merged commit 32158c0 into grafana:main Feb 9, 2026
39 of 40 checks passed
@ruslan-mikhailov ruslan-mikhailov deleted the bugfix/correct-lag-calculation branch February 9, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants