Add new histogram to generator - messaging system latency#3453
Add new histogram to generator - messaging system latency#3453joe-elliott merged 16 commits intografana:mainfrom
Conversation
|
Thanks for putting some time into this. It's a fairly small change should be a pretty easy review, but we do need this to be configurable. We are currently trying to keep metrics generator series down and histograms tend to generate a lot of series. I will give this more time tomorrow. |
Just pushed some changes to allow controlling the metric from config. Also, the metric is initialized even if the config is set to |
joe-elliott
left a comment
There was a problem hiding this comment.
a few questions, but looking good!
also, we will need a changelog
|
|
||
| func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 { | ||
| // handling potential underflow of unit64s substraction | ||
| diff := int64(unixNanoEnd) - int64(unixNanoStart) |
There was a problem hiding this comment.
instead of casting we could check to see if diff > end for overflow. is that better?
There was a problem hiding this comment.
also, we recently made a change to protect span metrics from negative latencies:
https://github.com/grafana/tempo/pull/3412/files
@mdisibio should we make sure the same protections are here?
There was a problem hiding this comment.
instead of casting we could check to see if diff > end for overflow. is that better?
I agree we can make it cleaner,
Really liked your idea, very clever 🙃 My only concern about it is how clear it will be. After making the check if diff > end, if underflow did happen, meaning the subtraction yields a negative number - don't we want to return it?
If not, maybe we could make this function a bit more specific and move the negative values "protection" into it.. and then.. return 0?
It could also be that simple:
func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 {
if unixNanoStart > unixNanoEnd {
// To prevent underflow, return 0.
return 0
}
// Safe subtraction.
return float64(unixNanoEnd-unixNanoStart) / 1e9
}
There was a problem hiding this comment.
let's follow suit with the linked PR for consistency and return 0.
i think your suggested unixNanosDiffSec in this PR is perfect.
There was a problem hiding this comment.
so this one is resolved? 🤔
|
|
||
| func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 { | ||
| // handling potential underflow of unit64s substraction | ||
| diff := int64(unixNanoEnd) - int64(unixNanoStart) |
There was a problem hiding this comment.
instead of casting we could check to see if diff > end for overflow. is that better?
I agree we can make it cleaner,
Really liked your idea, very clever 🙃 My only concern about it is how clear it will be. After making the check if diff > end, if underflow did happen, meaning the subtraction yields a negative number - don't we want to return it?
If not, maybe we could make this function a bit more specific and move the negative values "protection" into it.. and then.. return 0?
It could also be that simple:
func unixNanosDiffSec(unixNanoStart uint64, unixNanoEnd uint64) float64 {
if unixNanoStart > unixNanoEnd {
// To prevent underflow, return 0.
return 0
}
// Safe subtraction.
return float64(unixNanoEnd-unixNanoStart) / 1e9
}
|
Thanks for the work. This is looking good. heads up you will need this commit in your branch (merge or rebase) to get CI to pass: |
Synced from upstream + added docs. |
joe-elliott
left a comment
There was a problem hiding this comment.
Looking good, just have a few small thoughts.
Also, what do you think about adding the ConnectionType to the expired edge metric? If someone is wanting to do service graph metrics through queues I'm concerned it will create a lot of expired edges. It might be nice for an operator to know if their expired edges are largely queues or synchronous calls.
…ardinality.md use present when possible Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
0b29823 to
7121ecb
Compare
adirmatzkin
left a comment
There was a problem hiding this comment.
Ready to merge this one? @joe-elliott
Just rebased 🙃
|
Thank you for updating the docs |
|
@joe-elliott I think we're ready 🥳 |
|
running CI! |
|
Thanks, Adir! |
* introduce new service-graph metric for messaging-system latency * added tests for new histogram values * fix linting * make new metric optional via config * fix typo * fix failing tests * add feature to changelog * negative times diff consistency - return 0 instead of negative * update docs * Update docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md use present when possible Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * change 1e9 to time const * added a reference to the "wait" config of the processor * fixed indentations and formatting stuff from rebasing * removed mistaken println found by linter --------- Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
What this PR does:
This PR adds a new metric calculated by the metrics-generator service-graph processor.
The metric is a histogram for the latency between 2 services communicating through a messaging system.
Which issue(s) this PR fixes:
Fixes #3232
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]