Do not exceed ES _id length limit#905
Conversation
Signed-off-by: Tomasz Adamski <tomasz.adamski@gmail.com>
e0dfe2a to
6f43001
Compare
Codecov Report
@@ Coverage Diff @@
## master #905 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 126 126
Lines 6070 6074 +4
=====================================
+ Hits 6070 6074 +4
Continue to review full report at Codecov.
|
44ffbdf to
af89809
Compare
Instead of using arbitrary string (which length can be quite big) as document _id we compute sha256 from it instead. Signed-off-by: Łukasz Harasimowicz <dev@harnash.eu>
af89809 to
3489723
Compare
|
|
||
| import ( | ||
| "context" | ||
| "crypto/sha256" |
There was a problem hiding this comment.
there's no need for crypto hash, see
Lines 30 to 36 in d16de31
| } | ||
| serviceID := fmt.Sprintf("%s|%s", service.ServiceName, service.OperationName) | ||
| hashedID := fmt.Sprintf("%x", sha256.Sum256([]byte(serviceID))) | ||
| cacheKey := fmt.Sprintf("%s:%s", indexName, serviceID) |
There was a problem hiding this comment.
I don't see why we cannot use the hash for both document ID and cacheKey. Let's just have a single value.
Signed-off-by: Łukasz Harasimowicz <dev@harnash.eu>
c007a8e to
10fcf67
Compare
|
@yurishkuro thanks for the comments. I hopefully addressed them all in 10fcf67 |
| OperationName: "operation", | ||
| } | ||
|
|
||
| serviceHash, err := model.HashCode(service) |
There was a problem hiding this comment.
this introduces unnecessary public method to Service type. Instead I would simply add something like this:
func (s Service) hashCode() string {
h := fnv.New64a()
h.Write(s.ServiceName)
h.Write(s. OperationName)
return fmt.Sprintf("%x", h.Sum64())
}
Signed-off-by: Łukasz Harasimowicz <dev@harnash.eu>
| } | ||
|
|
||
| func (s Service) hashCode() string { | ||
| h := fnv.New64a() |
There was a problem hiding this comment.
@black-adder It could be but it will introduce issue with concurrency. Every simultaneous call to this function would potentially cause race condition and would render unpredictable hash results. I would need to introduce some kind of semaphore and mark this part as critical section. Do you think it is worth?
There was a problem hiding this comment.
unpredictable hash results -> shouldn't this always produce the same hash results for the same key even if run concurrently? I actually dont know the inner workings of the implementation so maybe completely wrong, we can skip for now but can you add a TODO to investigate in the future (unless you already know that there will be race conditions).
There was a problem hiding this comment.
@black-adder so my reasoning is like this.
- You store cached hasher object somewhere (most likely in the writer.go) and use it whenever someone calls
hashCode() - First process (1) calls
hashCode(), fetches (creates if not cached) hasher object - (1) Pushes data:
OperationNameandServiceNameto the hasher - Second process (2) calls
hashCode()which fetches the same (cached) hasher object - (2) pushes its
OperationNameandServiceNameto the same hasher - (1) calculates checksum by calling
Sum64()on the same hasher (which now holds data from two possibly different instances ofService) - (2) does the calculations too and gets the same (wrong) hash as (1)
I did actually run into this issue while fixing the tests :-)
Even if I introduce hash.Reset() just after Sum64() or before data is pushed it the hasher I still have no guarantees that other process is not doing some operation on the cached object. Putting Write semaphore would probably do the trick but I think it is a bit overkill in this case.
I hope this helps.
| indexService.On("Index", stringMatcher(indexName)).Return(indexService) | ||
| indexService.On("Type", stringMatcher(serviceType)).Return(indexService) | ||
| indexService.On("Id", stringMatcher("service|operation")).Return(indexService) | ||
| indexService.On("Id", stringMatcher(serviceHash)).Return(indexService) |
There was a problem hiding this comment.
I'm not a big fan of using the same hash function here to test that the service was hashed, I'd prefer if you generated this hash once and compare against the actual hash string.
There was a problem hiding this comment.
I've removed calls to hashCode() and put static hash in those tests.
Signed-off-by: Łukasz Harasimowicz <dev@harnash.eu>
Which problem is this PR solving?
Short description of the changes
_id