Skip to content

[metrics-generator] filter out spans based on policy#2274

Merged
zalegrala merged 44 commits intografana:mainfrom
zalegrala:metricsGeneratorFiltering
May 2, 2023
Merged

[metrics-generator] filter out spans based on policy#2274
zalegrala merged 44 commits intografana:mainfrom
zalegrala:metricsGeneratorFiltering

Conversation

@zalegrala
Copy link
Copy Markdown
Contributor

@zalegrala zalegrala commented Mar 29, 2023

What this PR does:

Here we implement an approach to filtering out spans based on a policy, loosely based around the OTEL collector filterspan config format.

Which issue(s) this PR fixes:
Fixes #1482

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment thread cmd/tempo/app/config.go Outdated
Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
Comment thread modules/generator/processor/spanmetrics/config.go Outdated
@knylander-grafana
Copy link
Copy Markdown
Contributor

Thank you for adding documentation!

Comment thread docs/sources/tempo/metrics-generator/span_metrics.md
Comment thread docs/sources/tempo/metrics-generator/span_metrics.md Outdated
Comment thread docs/sources/tempo/metrics-generator/span_metrics.md Outdated
Comment thread docs/sources/tempo/metrics-generator/span_metrics.md Outdated
Comment thread docs/sources/tempo/metrics-generator/span_metrics.md Outdated
Comment thread docs/sources/tempo/metrics-generator/span_metrics.md Outdated
Comment thread docs/sources/tempo/metrics-generator/span_metrics.md Outdated
Copy link
Copy Markdown
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! Thank you for adding them.

@zalegrala zalegrala force-pushed the metricsGeneratorFiltering branch from bf1a64f to dcdd76f Compare April 6, 2023 20:53
@zalegrala zalegrala marked this pull request as ready for review April 11, 2023 16:48
@zalegrala zalegrala force-pushed the metricsGeneratorFiltering branch 2 times, most recently from bca1054 to 126f390 Compare April 12, 2023 15:56
Comment thread cmd/tempo/app/config.go Outdated
Comment thread modules/generator/overrides.go Outdated
spanMetricsCallsTotal registry.Counter
spanMetricsDurationSeconds registry.Histogram
spanMetricsSizeTotal registry.Counter
spanMetricsFilterDropsTotal registry.Counter
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason we're pushing this and not just recording it as a normal prometheus metric in tempo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, good call out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke too soon. For a generator instance, we don't have the ID available currently, so to create a metric where we include the tenant label isn't feasible without a bit of refactor. This currently pushes the metric to the remote endpoint and would increase the series count, so perhaps this isn't something we want to do. Though, if we host the metric on the generator itself, then we'd have access to know which tenants would be filtering which spans, but likely this wouldn't have the desired value for folks who don't have access to those metrics. I.e. if a cloud user has a filter, they wouldn't be able to see how many spans are being rejected by their filter, which seems like the primary utility. I'm a little torn on even including this, but it does seem like we want some indication that the spans are being filtered out.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i really think we shouldn't be pushing this using remote write. the metrics we push are the generated ones. this describes the operational state of tempo which would just publish normally.

if a cloud user has a filter, they wouldn't be able to see how many spans are being rejected by their filter, which seems like the primary utility. I'm a little torn on even including this, but it does seem like we want some indication that the spans are being filtered out.

we can push this back through billing and expose it to the end user. i think we may just need to push the tenant id down into the instance. alternatively you can push a counter metric into the instance that already has the tenant id configured

Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
}

for _, policy := range p.filterPolicies {
if policy.Include != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could move the policy.include check into policyMatch() which would clean this up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what policyMatch() should do in the case of a nil policy. I agree it would be a little cleaner, but include is the inverse of exclude here, so for example returning a true value from policyMatch() isn't as clean. Perhaps I can make exclude and include behave the same here, and then make the suggested adjustment.

Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one broader question that just occurred to me: should all of this logic apply to both span metrics and service graph metrics? should we move this up a level?

@zalegrala
Copy link
Copy Markdown
Contributor Author

Thanks for the review @joe-elliott. If we move this up a level, do you suppose the processors should have an independent filter config, or share one? Sharing one would be simpler and likely more performant, but I wonder if folks would want to tune them independently.

@zalegrala
Copy link
Copy Markdown
Contributor Author

In lieu of an specific idea about how widely to apply the filtering, I'm going to restructure the span filtering into pkg/spanfilter and implement it in the spanmetrics processor. I think this will give is the ability to re-use the code elsewhere if we want to pick it up later and still give us the short term benefit of filtering on spans in the spanmetrics currently.

@zalegrala zalegrala force-pushed the metricsGeneratorFiltering branch from b2df1e4 to 6e15687 Compare April 14, 2023 15:22
@zalegrala zalegrala requested a review from joe-elliott April 18, 2023 15:12
@zalegrala zalegrala force-pushed the metricsGeneratorFiltering branch from ebced35 to 7fb6e5f Compare April 20, 2023 15:27
Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really liking the refactor to pull out the filters. Some thoughts but this is close

Comment thread docs/sources/tempo/metrics-generator/span_metrics.md
spanMetricsCallsTotal registry.Counter
spanMetricsDurationSeconds registry.Histogram
spanMetricsSizeTotal registry.Counter
spanMetricsFilterDropsTotal registry.Counter
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i really think we shouldn't be pushing this using remote write. the metrics we push are the generated ones. this describes the operational state of tempo which would just publish normally.

if a cloud user has a filter, they wouldn't be able to see how many spans are being rejected by their filter, which seems like the primary utility. I'm a little torn on even including this, but it does seem like we want some indication that the spans are being filtered out.

we can push this back through billing and expose it to the end user. i think we may just need to push the tenant id down into the instance. alternatively you can push a counter metric into the instance that already has the tenant id configured

Comment thread modules/generator/processor/spanmetrics/spanmetrics.go Outdated
Comment thread pkg/spanfilter/spanfilter.go Outdated
}
matches++
case traceql.IntrinsicKind:
if !stringMatch(policy.MatchType, span.GetKind().String(), pa.Value.(string)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway we can do int matches here and on status? would be much faster

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, but it might be a little clumsy, since we'd need to take this as an int in the config I think. There is a special yaml struct tag we could use iirc to parse a string as an int, or perhaps some custom unmarshaling. For now I'm inclined to leave it and come back to it later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can work this. I'll take a closer look next week.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit for this. It made some parts less readable, but seems okay to me. How's that look?

@zalegrala zalegrala force-pushed the metricsGeneratorFiltering branch from 5214024 to 95fd22e Compare April 24, 2023 18:07
zalegrala and others added 24 commits May 2, 2023 17:52
…nt overrides

Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Co-authored-by: Joe Elliott <joe.elliott@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
@zalegrala zalegrala force-pushed the metricsGeneratorFiltering branch from 1cec33b to 79151bb Compare May 2, 2023 17:53
zalegrala added 3 commits May 2, 2023 18:40
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
Signed-off-by: Zach Leslie <zach.leslie@grafana.com>
@zalegrala zalegrala merged commit 14848fd into grafana:main May 2, 2023
@zalegrala zalegrala deleted the metricsGeneratorFiltering branch May 2, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metrics-generator should filter out spans based upon attributes

4 participants