Clarifying concurrency-based SLA testing for rosetta endpoints #357

linconvidal · 2025-03-20T02:33:19Z

linconvidal
Mar 20, 2025
Maintainer

@matiwinnetou brought up a crucial perspective: we should focus explicitly on measuring how many concurrent virtual users each individual endpoint can handle before reaching a defined SLA thresholds (let’s say we set at p95 and p99 latencies below 1 second). In essence, this means testing endpoints individually and progressively increasing concurrency until we clearly identify the maximum number of simultaneous users each endpoint can reliably support. The endpoint that first breaches this latency SLA under load defines our overall concurrency limit. Establishing this simple yet clear initial metric sets a foundation, which we can then iteratively refine.

TLDR: we’re trying to answer the question “How many concurrent users can the system handle and still stay below 1 second at p95 or p99?” by:

Measuring performance per endpoint;
Finding clear concurrency limits (max users for which p95 < 1s, or max concurrency meeting SLA);
Clearly documenting these limits explicitly.

Currently, our artillery tests have been inspired by existing setups that primarily use phases based on arrival rates. This method aligns closely with Artillery's recommended testing philosophy, which emphasizes a hybrid workload model:

Open loop: New virtual users continue to arrive independently of each other, resembling real-world scenarios where user arrivals are not strictly controlled by system responses.
Closed loop: Each individual user waits for a response from the server before proceeding to the next step, thus realistically simulating actual user behavior.

Artillery explicitly recommends this hybrid model to avoid the pitfalls of coordinated omission—a problem common with purely closed-loop (fixed concurrency) tests, where slow responses artificially reduce request rates, masking performance bottlenecks.

However, this hybrid approach doesn't inherently measure concurrency levels explicitly. @matiwinnetou's request highlights precisely this nuance: we need clarity on concurrency limits rather than just request rates. Therefore, to clearly meet both Artillery’s philosophy and our explicit concurrency-measurement requirements, I propose a carefully structured adaptation:

High arrivalRate + maxVusers: We leverage a high arrival rate (ensuring continuous arrival of users) combined with the clearly defined maxVusers parameter (which caps the maximum concurrent users). This approach lets us clearly and effectively simulate stable concurrency levels.
Warm-up phases (~1 min): Ensuring the system stabilizes at each target concurrency before we begin measurement.
Sustained measurement phases (2–5 mins): Allowing us to collect accurate and stable p95/p99 latency metrics at each concurrency level.

I suggest conducting these tests incrementally, clearly stepping up through concurrency levels (such as 1, 2, 4, 8, 12, 16… concurrent users), and observing precisely when and where we exceed our defined latency thresholds. This structured incremental testing clarifies exactly which endpoints handle load effectively and which ones need attention.

The ideal outcome from such a test would look like this:

Endpoint	Users (concurrency)	p95 response time (ms)	Meets SLA (p95 < 1s)?
/block	10	✅ 200ms	✅ Yes
/block	20	✅ 400ms	✅ Yes
/block	50	⚠️ 1200ms	❌ No

In this hypothetical scenario, we'd quickly see /search/transactions identified as our critical performance bottleneck, clearly limiting the overall concurrency of our system to 10 users. This clarity directs our immediate optimization efforts effectively.

I have documented this approach in more detail on the Wiki for future reference/discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarifying concurrency-based SLA testing for rosetta endpoints #357

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Clarifying concurrency-based SLA testing for rosetta endpoints #357

Uh oh!

linconvidal Mar 20, 2025 Maintainer

Replies: 0 comments

linconvidal
Mar 20, 2025
Maintainer