Skip to content

spanmetrics: drop invalid prometheus label names#5122

Merged
joe-elliott merged 19 commits intografana:mainfrom
KyriosGN0:drop-invalid-metrics
Jul 3, 2025
Merged

spanmetrics: drop invalid prometheus label names#5122
joe-elliott merged 19 commits intografana:mainfrom
KyriosGN0:drop-invalid-metrics

Conversation

@KyriosGN0
Copy link
Copy Markdown
Contributor

@KyriosGN0 KyriosGN0 commented May 10, 2025

Signed-off-by: AvivGuiser avivguiser@gmail.com

What this PR does:

Which issue(s) this PR fixes:
Fixes #4434

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the PR, but i believe we are already sanitizing label names with SanitizeLabelNameWithCollisions ?

is this necessary as well?

Comment thread CHANGELOG.md Outdated
Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
@KyriosGN0
Copy link
Copy Markdown
Contributor Author

thank you for the PR, but i believe we are already sanitizing label names with SanitizeLabelNameWithCollisions ?

is this necessary as well?

from reading what that functions does, it seems that it check is labels are also used in intrinsics and then it adds __ to those labels

@joe-elliott
Copy link
Copy Markdown
Collaborator

@KyriosGN0
Copy link
Copy Markdown
Contributor Author

they're also being passed through here: main/modules/generator/processor/spanmetrics/spanmetrics.go#L73

which is Prometheus's sanitize func: prometheus/prometheus@main/util/strutil/strconv.go#L45-L47

I see, in that case we make the regex simpler and cover the case that this functions doesn't (since it states that it doesn't cover all of the prometheus's requirement, the scenario is very limited and only happens if we enable InfoTarget metrics from what i understand, (or if someone on purpose request adds a dimension that is against Prometheus standard)
What do you think ?

@joe-elliott
Copy link
Copy Markdown
Collaborator

I'm concerned about running another regex on every label that goes through metrics generator. I see two options, but there are likely more:

  1. We could try SanitizeFullLabelName which seems to handle the "can't be prefixed with numbers" situation. We'd need benchmarks over different strings (valid, invalid, etc) to confirm it wouldn't be a nasty regression.

  2. Since the only rule not covered by the above sanitize is the "can't be prefixed with numbers" we could simply check if the first rune of the string is a number and, if so, reject it. This would be cheaper than running a regex.

@KyriosGN0
Copy link
Copy Markdown
Contributor Author

I'm concerned about running another regex on every label that goes through metrics generator. I see two options, but there are likely more:

  1. We could try SanitizeFullLabelName which seems to handle the "can't be prefixed with numbers" situation. We'd need benchmarks over different strings (valid, invalid, etc) to confirm it wouldn't be a nasty regression.
  2. Since the only rule not covered by the above sanitize is the "can't be prefixed with numbers" we could simply check if the first rune of the string is a number and, if so, reject it. This would be cheaper than running a regex.

i think options 2 is easier and covers the point of the issue, i will rewrite the PR accordingly

@KyriosGN0 KyriosGN0 requested a review from ie-pham as a code owner June 7, 2025 18:39
@KyriosGN0
Copy link
Copy Markdown
Contributor Author

@joe-elliott i have addressed your comments

@KyriosGN0 KyriosGN0 force-pushed the drop-invalid-metrics branch from a0afe50 to 5601638 Compare June 7, 2025 18:51
@KyriosGN0 KyriosGN0 requested a review from joe-elliott June 7, 2025 19:03
Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
@KyriosGN0
Copy link
Copy Markdown
Contributor Author

@joe-elliott fixed your comments

Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
@KyriosGN0 KyriosGN0 force-pushed the drop-invalid-metrics branch from 2582d9a to cc81db7 Compare June 18, 2025 18:29
if resourceAttributesCount > 0 && len(targetInfoLabels) > resourceAttributesCount {
p.spanMetricsTargetInfo.SetForTargetInfo(targetInfoRegistryLabelValues, 1)
}
validatePromLabelNames(&labels, &labelValues)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be done on targetInfoLabels and values? actually, let's dig deeper than that.

targetInfoLabels are copied from resourceLabels which are retrieved from GetTargetInfoAttributeValues.

Why don't we just skip these invalid labels in the GetTargetInfoAttributeValues implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fair, but if we are doing that, that functions also passes labels through the sanitize functions mentioned before, if we are touching that i think it would be better to switch the functions to use the full SanitizeFullLabelName functions from the prometheus lib

i will try to run the benchmark test locally with that change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have tested locally, and test pass, but i can't make the benchmark test work on my machine, (docker-compose ain't working since the load test image is missing)
is there some way /updated guide on how to run the benchmark on macos ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to docker to run benchmarks. Just create a new benchmark that focuses on the change:

func BenchFoo(b *testing.B) {
  ...
}

and run it:

go test -run xxx -bench BenchFoo -benchmem -count=5 > after.txt

generate a before and after.txt and then you can use benchstat to get a summary

benchstat before.txt after.txt

I will say this change is a lot more complex and it would be a lot safer to just ignore the integer prefixed metrics in the GetTargetInfoAttributeValues func

Copy link
Copy Markdown
Contributor Author

@KyriosGN0 KyriosGN0 Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from performance degradation what else is at risk here? Its not like the other rules could be ingested to prometheus
EDIT: never mind after reading the prometheus functions i see it had more behavioral changes, i won't use it right now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joe-elliott address your comments, i have a added a simple check in the if condition to skip the keys that start with a number

…m function

Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
@KyriosGN0 KyriosGN0 requested a review from joe-elliott June 25, 2025 15:01
Signed-off-by: AvivGuiser <avivguiser@gmail.com>
}

c := reclaimable.New(strutil.SanitizeLabelName, 10000)
c := reclaimable.New(strutil.SanitizeFullLabelName, 10000)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so close!

my last concern is this change here. SanitizieFullLabelName creates a string builder and reallocs the string every time. SanitizeLabelName uses a regex to replace all invalid chars with _. we can either change it back or i'd need to see a benchmark of these two methods to confirm there isn't a regression here.

if you do write a benchmark let's make sure to bench the case where both substitutions and no substitutions are made on both funcs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joe-elliott forgot to revert it, fixed

Signed-off-by: AvivGuiser <avivguiser@gmail.com>
@joe-elliott
Copy link
Copy Markdown
Collaborator

That jsonnet check thing is failing on my branch too. I'm currently looking into it.

@joe-elliott
Copy link
Copy Markdown
Collaborator

this should fix it: #5349

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go!

@joe-elliott joe-elliott merged commit 5fe86d4 into grafana:main Jul 3, 2025
64 of 67 checks passed
@joe-elliott
Copy link
Copy Markdown
Collaborator

Nice fix 🙏 Thank you for your patience and responsiveness to reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metrics-generator can generate metrics with invalid labels

2 participants