Test that processes with identical tags are deduped#708
Conversation
Signed-off-by: Yuri Shkuro <ys@uber.com>
|
Thanks for this test, could you elaborate on why the process hash needs to be resilient to changes in tag ordering? Further, toDomain doesn't even make use of the jaeger/model/converter/json/to_domain.go Lines 182 to 194 in fcbd210 |
Because that's how it's used in the UI converter to dedupe Process entries, and in practice is the only reason why process.hash() function even exists. This is just a piece of tribal knowledge that's in my head that should be encoded in the tests. Then if we want to change that behavior (and/or fix the bugs where NewProcess is not used), it's a different change. |
I agree that the UI converter dedupes by putting items into a map as seen here: jaeger/model/converter/json/from_domain.go Lines 178 to 182 in e86551d Is the the deduping mechanism you are talking about?
I agree, could you elaborate on what guarantees that the model.Span being passed to the UI converter from storage has tags in sorted order? I couldn't find any code in the spanstore writers or readers that order process tags; and am having trouble understanding where this exists. Should we continue further discussion in #693 instead? |
Why is this relevant to this PR? The |
How do we know that it is making this implicit assumption? It seems to be working fine when this assumption is violated. Is this assumption required?
Because without that guarantee, the assumption is invalid. |
The assumption is obvious from the code: |
Could you please reply to this earlier question?
Could you please elaborate? |
It does not care about ordering (as this test demonstrates), it depends on the hashcode to be the same for processes with the same tags. Our collection pipeline does not guarantee that the tag ordering is preserved (and even if it did as some side-effect it would be a poor design for this module to depend on it). |
My argument is that we are doing exactly this.
Could you clarify on how this test demonstrates that The test always creates processes with tags in a sorted order because it uses Perhaps using |
That's precisely why this test is needed, to record this implicit dependency. If we want to change how NewProcess works we need to refactor UI converter to do things differently. |
I'm having trouble wrapping my head around how UI convertor can have an implicit dependency on Could you please elaborate? Also, as you mentioned earlier, we are operating under the assumption that the storage and retrieval pipelines make no guarantees on ordering. |
|
if it invoked NewProcess, the dependency would be explicit. The implicit dependency is that it expects the data that gets to ui converter to be constructed in such a way that hashcode() is stable for the same process regardless of the tag order. |
It is not clear to me where the code expects hashcode() to be stable for the same process regardless of the tag order. The code sample that you posted earlier doesn't demonstrate this requirement. Could you elaborate? The current collection pipeline doesn't sort tags for the jaeger model for both Cassandra and ES, and things appear to be working fine. Could you help me understand by providing a concrete example that shows that hashcode needs to be stable for arbitrary ordering of tags? |
What else does this code expect from the hash() function? To get a random number? To get a value of the pointer? Or to get a stable value so that
What is your evidence that "things appear to be working fine"? Deduping of the Process in the UI model is a feature that is currently broken. The fact that UI handles it gracefully doesn't make it unbroken. |
|
And fwiw, to make sure that the deduping feature is correct again, we just need to change 2 more lines to use NewProcess: |
I'm confused. While I understand the purpose of the hash function. I don't understand where the implicit dependency of requiring sorted tags comes from. I don't understand how the ordering of these tags can change during an invocation of
I had just checked the UI
I agree
This wasn't obvious to me, nor is it documented. Why did you not share this information earlier? Seeing that this feature is already broken, what did you mean by "This breaks things" on #693? |
This PR is not about sorted tags. It's about capturing the expectations of |
|
As discussed offline, the ui converter should be refactored to not depend on the order of tags provided by the called (#713). |
The UI converter dedups Process objects using the hash code that is currently independent of the tags order (provided NewProcess ctor is used). Whether relying on NewProcess to sort the tags is the right or wrong thing, this PR simply enhances the tests to demonstrate the existing dependency.
Signed-off-by: Yuri Shkuro ys@uber.com