Skip to content

Test that processes with identical tags are deduped#708

Merged
yurishkuro merged 2 commits intomasterfrom
ui-conv-tests
Feb 25, 2018
Merged

Test that processes with identical tags are deduped#708
yurishkuro merged 2 commits intomasterfrom
ui-conv-tests

Conversation

@yurishkuro
Copy link
Copy Markdown
Member

@yurishkuro yurishkuro commented Feb 21, 2018

The UI converter dedups Process objects using the hash code that is currently independent of the tags order (provided NewProcess ctor is used). Whether relying on NewProcess to sort the tags is the right or wrong thing, this PR simply enhances the tests to demonstrate the existing dependency.

Signed-off-by: Yuri Shkuro ys@uber.com

Signed-off-by: Yuri Shkuro <ys@uber.com>
@coveralls
Copy link
Copy Markdown

coveralls commented Feb 21, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling dde462f on ui-conv-tests into 2557fa1 on master.

@vprithvi
Copy link
Copy Markdown
Contributor

Thanks for this test, could you elaborate on why the process hash needs to be resilient to changes in tag ordering?

Further, toDomain doesn't even make use of the NewProcess as seen in this code snippet:

func (td toDomain) convertProcess(process *json.Process) (*model.Process, error) {
if process == nil {
return nil, errors.New("Process is nil")
}
tags, err := td.convertKeyValues(process.Tags)
if err != nil {
return nil, err
}
return &model.Process{
Tags: tags,
ServiceName: process.ServiceName,
}, nil
}

@yurishkuro
Copy link
Copy Markdown
Member Author

why the process hash needs to be resilient to changes in tag ordering

Because that's how it's used in the UI converter to dedupe Process entries, and in practice is the only reason why process.hash() function even exists. This is just a piece of tribal knowledge that's in my head that should be encoded in the tests. Then if we want to change that behavior (and/or fix the bugs where NewProcess is not used), it's a different change.

@vprithvi
Copy link
Copy Markdown
Contributor

Because that's how it's used in the UI converter to dedupe Process entries,

I agree that the UI converter dedupes by putting items into a map as seen here:

func (fd fromDomain) convertProcesses(processes map[string]*model.Process) map[json.ProcessID]json.Process {
out := make(map[json.ProcessID]json.Process)
for key, process := range processes {
out[json.ProcessID(key)] = fd.convertProcess(process)
}

Is the the deduping mechanism you are talking about?

This is just a piece of tribal knowledge that's in my head that should be encoded in the tests.

I agree, could you elaborate on what guarantees that the model.Span being passed to the UI converter from storage has tags in sorted order?

I couldn't find any code in the spanstore writers or readers that order process tags; and am having trouble understanding where this exists.

Should we continue further discussion in #693 instead?

@yurishkuro
Copy link
Copy Markdown
Member Author

could you elaborate on what guarantees that the model.Span being passed to the UI converter from storage has tags in sorted order

Why is this relevant to this PR? The process_hashtable makes this implicit assumption, so we should encode it in the tests.

@vprithvi
Copy link
Copy Markdown
Contributor

The process_hashtable makes this implicit assumption

How do we know that it is making this implicit assumption? It seems to be working fine when this assumption is violated. Is this assumption required?

Why is this relevant to this PR?

Because without that guarantee, the assumption is invalid.

@yurishkuro
Copy link
Copy Markdown
Member Author

How do we know that it is making this implicit assumption? It seems to be working fine when this assumption is violated. Is this assumption required?

The assumption is obvious from the code:

https://github.com/jaegertracing/jaeger/blob/master/model/converter/json/process_hashtable.go#L41

@vprithvi
Copy link
Copy Markdown
Contributor

It seems to be working fine when this assumption is violated. Is this assumption required?

Could you please reply to this earlier question?

The assumption is obvious from the code:
https://github.com/jaegertracing/jaeger/blob/master/model/converter/json/process_hashtable.go#L41

Could you please elaborate?
While it is obvious that process_hashtable#getKey depends on an arbitrary ordering being preserved; it is not obvious to me how or where it assumes that process tags are in sorted order.

@yurishkuro
Copy link
Copy Markdown
Member Author

While it is obvious that process_hashtable#getKey depends on an arbitrary ordering being preserved; it is not obvious to me how or where it assumes that process tags are in sorted order.

It does not care about ordering (as this test demonstrates), it depends on the hashcode to be the same for processes with the same tags. Our collection pipeline does not guarantee that the tag ordering is preserved (and even if it did as some side-effect it would be a poor design for this module to depend on it).

@vprithvi
Copy link
Copy Markdown
Contributor

Our collection pipeline does not guarantee that the tag ordering is preserved (and even if it did as some side-effect it would be a poor design for this module to depend on it).

My argument is that we are doing exactly this.
There is no code in the UI converter that sorts tags.

It does not care about ordering (as this test demonstrates),

Could you clarify on how this test demonstrates that process_hashtable#getKey does not care about ordering?

The test always creates processes with tags in a sorted order because it uses NewProcess. Because the input is always sorted, it doesn't demonstrate that process_hashtable#getKey doesn't care about ordering.

Perhaps using &Process{} to create an arbitrarily ordered list of tags would better demonstrate it.

@yurishkuro
Copy link
Copy Markdown
Member Author

yurishkuro commented Feb 21, 2018

The test always creates processes with tags in a sorted order because it uses NewProcess

That's precisely why this test is needed, to record this implicit dependency. If we want to change how NewProcess works we need to refactor UI converter to do things differently.

@vprithvi
Copy link
Copy Markdown
Contributor

vprithvi commented Feb 21, 2018

That's precisely why this test is needed, to record this implicit dependency.

I'm having trouble wrapping my head around how UI convertor can have an implicit dependency on NewProcess when it does not invoke NewProcess.

Could you please elaborate?

> pwd
/Users/prithviraj/gocode/src/github.com/jaegertracing/jaeger/model/converter/json
> ag NewProcess                                                           
process_hashtable_test.go
28:	p1 := model.NewProcess("s1", []model.KeyValue{
31:	p1dup := model.NewProcess("s1", []model.KeyValue{
34:	p2 := model.NewProcess("s2", []model.KeyValue{
57:	p1 := model.NewProcess("s1", []model.KeyValue{
60:	p2 := model.NewProcess("s2", []model.KeyValue{

Also, as you mentioned earlier, we are operating under the assumption that the storage and retrieval pipelines make no guarantees on ordering.

@yurishkuro
Copy link
Copy Markdown
Member Author

if it invoked NewProcess, the dependency would be explicit. The implicit dependency is that it expects the data that gets to ui converter to be constructed in such a way that hashcode() is stable for the same process regardless of the tag order.

@vprithvi
Copy link
Copy Markdown
Contributor

it expects the data that gets to ui converter to be constructed in such a way that hashcode() is stable for the same process regardless of the tag order.

It is not clear to me where the code expects hashcode() to be stable for the same process regardless of the tag order. The code sample that you posted earlier doesn't demonstrate this requirement. Could you elaborate?

The current collection pipeline doesn't sort tags for the jaeger model for both Cassandra and ES, and things appear to be working fine. Could you help me understand by providing a concrete example that shows that hashcode needs to be stable for arbitrary ordering of tags?

@yurishkuro yurishkuro changed the title Test that process hash is resilient to tags order Test that process with identical tags are deduped Feb 22, 2018
@yurishkuro yurishkuro changed the title Test that process with identical tags are deduped Test that processes with identical tags are deduped Feb 22, 2018
@yurishkuro
Copy link
Copy Markdown
Member Author

It is not clear to me where the code expects hashcode() to be stable for the same process

// getKey assigns a new unique string key to the process, or returns
// a previously assigned value if the process has already been seen.
func (ph *processHashtable) getKey(process *model.Process) string {
	hash := ph.hash(process)
	if keys, ok := ph.processes[hash]; ok {

What else does this code expect from the hash() function? To get a random number? To get a value of the pointer? Or to get a stable value so that if the process has already been seen from the comment can be satisfied?

The current collection pipeline doesn't sort tags for the jaeger model for both Cassandra and ES, and things appear to be working fine.

What is your evidence that "things appear to be working fine"? Deduping of the Process in the UI model is a feature that is currently broken. The fact that UI handles it gracefully doesn't make it unbroken.

@yurishkuro
Copy link
Copy Markdown
Member Author

And fwiw, to make sure that the deduping feature is correct again, we just need to change 2 more lines to use NewProcess:

./plugin/storage/cassandra/spanstore/dbmodel/converter.go:169:	return &model.Process{
./model/converter/json/to_domain.go:190:	return &model.Process{

@vprithvi
Copy link
Copy Markdown
Contributor

What else does this code expect from the hash() function? To get a random number? To get a value of the pointer? Or to get a stable value so that if the process has already been seen from the comment can be satisfied?

I'm confused. While I understand the purpose of the hash function. I don't understand where the implicit dependency of requiring sorted tags comes from.

I don't understand how the ordering of these tags can change during an invocation of convertProcess.

What is your evidence that "things appear to be working fine"?

I had just checked the UI

The fact that UI handles it gracefully doesn't make it unbroken.

I agree

Deduping of the Process in the UI model is a feature that is currently broken.

This wasn't obvious to me, nor is it documented. Why did you not share this information earlier?

Seeing that this feature is already broken, what did you mean by "This breaks things" on #693?

@yurishkuro
Copy link
Copy Markdown
Member Author

I'm confused. While I understand the purpose of the hash function. I don't understand where the implicit dependency of requiring sorted tags comes from.

This PR is not about sorted tags. It's about capturing the expectations of processHashtable.getKey to generate the same string name for equivalent Process instances, as documented in its comments. What specifically do you object to?

@yurishkuro
Copy link
Copy Markdown
Member Author

As discussed offline, the ui converter should be refactored to not depend on the order of tags provided by the called (#713).

@yurishkuro yurishkuro merged commit b61b7ec into master Feb 25, 2018
@ghost ghost removed the review label Feb 25, 2018
@yurishkuro yurishkuro deleted the ui-conv-tests branch May 6, 2018 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants