Skip to content

feat: improve observability of collection failures in the metrics generator with error categorization#5936

Merged
javiermolinar merged 2 commits intomainfrom
feat-improve-collection-error-metric
Nov 13, 2025
Merged

feat: improve observability of collection failures in the metrics generator with error categorization#5936
javiermolinar merged 2 commits intomainfrom
feat-improve-collection-error-metric

Conversation

@javiermolinar
Copy link
Copy Markdown
Contributor

@javiermolinar javiermolinar commented Nov 12, 2025

What this PR does:
It improves the observability of collection errors in the metrics generator by categorize Prometheus errors. Right now the current metric lacks of proper detail.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@javiermolinar javiermolinar changed the title feat: improve observability of collection failures with error categorization feat: improve observability of collection failures in the metrics generator with error categorization Nov 12, 2025
case errors.Is(err, tsdb.ErrInvalidSample),
errors.Is(err, tsdb.ErrInvalidExemplar):
return "invalid"
case errors.Is(err, storage.ErrOutOfOrderSample),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good but why group multiple error types with the same label?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to have an actionable observability. Matching 1-1 the errors in Prometheus can end in cognitive overload, this also maintains forward compatibly. Also, this keeps the cardinality low. We can always go to the logs to know the exact error

errors.Is(err, storage.ErrCTNewerThanSample):
return "out_of_order"
case errors.Is(err, storage.ErrDuplicateSampleForTimestamp),
errors.Is(err, storage.ErrDuplicateExemplar):
Copy link
Copy Markdown
Member

@electron0zero electron0zero Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Exemplar seems like a fit for dedicated ErrType instead of being clubbed with duplicate_sample

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is having behavioral categories, not type-based

@javiermolinar javiermolinar merged commit d021e6a into main Nov 13, 2025
51 of 53 checks passed
@javiermolinar javiermolinar deleted the feat-improve-collection-error-metric branch November 13, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants