Description
Coordinated omission is a term coined by Gil Tene to describe the phenomenon when the measuring system inadvertently coordinates with the system being measured in a way that avoids measuring outliers.
One example of how this can happen would be if a load tester waits to send a request until the previous one has completed. If the load tester is testing 10 req/s and a request normally takes 50ms each request will return before the next one is due to be sent. However if the whole system occasionally pauses for 5 seconds, the load tester would not send any requests during this 5 second period. The load test would record a single bad outlier that took 5 seconds.
If the load tester was firing requests consistently then it would have made 100 requests during the 5 second pause time, these requests were omitted. If these requests were made during the pause time, then the latency percentiles would look very different and more accurately capture the systems behaviour under load.
There are a number of good videos and blog posts which discuss this more. I was evaluating artillery and wanted to see if it accounted for coordinated omission, but couldn't see any discussion of it in issues or code. Is this something that artillery tries to prevent?
- How NOT to measure latency
- https://groups.google.com/forum/#!msg/mechanical-sympathy/icNZJejUHfE/BfDekfBEs_sJ
- http://highscalability.com/blog/2015/10/5/your-load-generator-is-probably-lying-to-you-take-the-red-pi.html
- https://news.ycombinator.com/item?id=10486215
- http://psy-lob-saw.blogspot.com/2016/07/fixing-co-in-cstress.html
- https://bravenewgeek.com/benchmarking-message-queue-latency/
- https://github.com/giltene/wrk2#acknowledgements