tests: reduce integ test flakiness + improve debugability #5282

lrstewart · 2025-05-02T18:25:53Z

Release Summary:

Resolved issues:

Description of changes:

Our integ tests have been failing more often lately. I suspect it's a Codebuild noisy neighbor problem. The main tests I see fail are "test_renegotiate" and "test_serialization".

I ran the integ tests on an ec2 instance also running stress -c 4 -i 4 -m 4 -d 4. It was almost comically effective. I was able to consistently repro the failures from the CI, plus more. The downside is we're completely unable to use DHE (:

Initially, I focused my investigation on test_renegotiate. I attempted to explain my fix for test_renegotiate in a code comment-- see if that makes sense.

Call-outs:

See #5282 (comment) for an attempt to explain what I'm doing with the stdout / stderr printing logic.

Testing:

The integ tests are still flaky because "test_serialization" is still flaky. But they should not fail for "test_renegotiate". test_renegotiate passes locally with stress running (and DHE cipher suites disabled) and I have not seen a test_renegotiate failure in the CI.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lrstewart · 2025-05-05T19:33:40Z

tests/integrationv2/fixtures.py

+            # Always print the results
+            if p.results:


Okay so bear with me: ManagedProcess is async. We don't print stdout / stderr until after the results for a process are marked ready. Once the results for one process are marked ready, a test can assert the success of that process. If that assertion fails, we may exit the test without printing stdout / stderr. Usually this just results in failure to print stdout / stderr for one process, but because we mark the results ready BEFORE doing the print, theoretically we could fail to print stdout / stderr for both processes.

Rather than trying to cleanup that async mess, I moved the printing here. This logic should ALWAYS trigger, regardless of failed assertions. Handling teardown logic is one of the specific responsibilities of fixtures.

Code references to attempt to explain:

get_results() waits for _results_ready(): link

_results_ready() waits for results to exist: link

results are set in run(): link

stdout/stderr are printed later in run(): link

lrstewart · 2025-05-05T20:25:16Z

tests/integrationv2/test_renegotiate.py

+4. Client s2n_recv reads ApplicationData
+5. Client s2n_renegotiate sends ClientHello


This is why I had to remove the select from the renegotiate method in echo.c. That select assumes there will be ApplicationData to read, but the previous call to s2n_recv that read the HelloRequest might also have read the ApplicationData.

goatgoose · 2025-05-05T22:14:44Z

tests/integrationv2/test_renegotiate.py

-    # In order to test the case where application data is received during renegotiation,
-    # we must verify that the data was received after renegotiation started but before the new handshake finished.
-    reneg_starts = stdout_str.find(S2N_RENEG_START_MARKER)
-    reneg_finishes = stdout_str.find(S2N_RENEG_SUCCESS_MARKER)
-    assert to_marker(first_server_app_data) in stdout_str[reneg_starts:reneg_finishes]


If we can't reliably assert that we're testing what we think we are, I'm not sure I see much value in keeping the test. If we cause this test to break and it only results in flakiness, it seems unlikely that we'll actually catch the issue? Although I guess it doesn't hurt to keep the test anyway?

I wonder how often it's testing the wrong behavior though. Like, could we keep running this test until this assertion is met, and fail the test if it isn't met after a few runs or something?

I mean, it basically never failed in CI before this week. But with stress running, I put the check back and ran the tests a few times:
5 passed before first failure
9 passed before first failure
1 passed before first failure
0 passed before first failure
2 passed before first failure
1 passed before first failure
16 passed before first failure
There are 171 test cases :)

So the answer is that it usually works under normal conditions. But when the conditions get weird, the order of messages can get weird, and it has a high chance of failing.

Like, could we keep running this test until this assertion is met, and fail the test if it isn't met after a few runs or something?

That would make sense to me if there was a well-defined X percent chance of failure so we could choose the number of test runs such that the actual failure chance is known and acceptably small. But this is entirely "whims of the network". The assertion could fail 100% of the time.

I'd be fine with just deleting the test though. @maddeleine thoughts?

Yeah I don't like this. We're essentially flying blind without these asserts. This is unit tested so we should just delete the test.
We need a different type of integ testing that is deterministic(e.g. not fiddling around with processes). Any of our sufficiently complex IO tests could go there, because we literally cannot get this right with our current framework.

bin/echo.c

lrstewart changed the title ~~tests: reduce integ test flakiness + debugability~~ tests: reduce integ test flakiness + improve debugability May 2, 2025

github-actions bot added the s2n-core team label May 2, 2025

lrstewart force-pushed the integ_fixes branch from cfc2557 to d42416f Compare May 2, 2025 19:01

tests: relax flaky renegotiate integ test

3ec109d

lrstewart force-pushed the integ_fixes branch from d42416f to 36d991b Compare May 5, 2025 19:20

tests: improve integ debugging

d7c471f

lrstewart force-pushed the integ_fixes branch from 36d991b to d7c471f Compare May 5, 2025 19:25

lrstewart commented May 5, 2025

View reviewed changes

lrstewart requested review from goatgoose and maddeleine May 5, 2025 19:38

lrstewart marked this pull request as ready for review May 5, 2025 19:38

lrstewart commented May 5, 2025

View reviewed changes

goatgoose approved these changes May 5, 2025

View reviewed changes

lrstewart mentioned this pull request May 5, 2025

tests: fix flaky test_serialization #5288

Merged

just kill the test

093383e

lrstewart requested a review from goatgoose May 6, 2025 16:26

goatgoose approved these changes May 6, 2025

View reviewed changes

bin/echo.c Outdated Show resolved Hide resolved

kill more

838b0cb

lrstewart requested a review from goatgoose May 6, 2025 23:17

goatgoose approved these changes May 6, 2025

View reviewed changes

maddeleine approved these changes May 6, 2025

View reviewed changes

Merge branch 'main' into integ_fixes

392d134

lrstewart added this pull request to the merge queue May 7, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 7, 2025

lrstewart added this pull request to the merge queue May 7, 2025

Merged via the queue into aws:main with commit cc8fb18 May 7, 2025
46 checks passed

lrstewart deleted the integ_fixes branch May 7, 2025 18:47

dougch pushed a commit to dougch/s2n-tls that referenced this pull request May 9, 2025

tests: reduce integ test flakiness + improve debugability (aws#5282)

5ff4ed9

BrewTestBot mentioned this pull request May 22, 2025

s2n 1.5.19 Homebrew/homebrew-core#224460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: reduce integ test flakiness + improve debugability #5282

tests: reduce integ test flakiness + improve debugability #5282

Uh oh!

lrstewart commented May 2, 2025 •

edited

Loading

Uh oh!

lrstewart May 5, 2025 •

edited

Loading

Uh oh!

lrstewart May 5, 2025

Uh oh!

lrstewart May 5, 2025

Uh oh!

goatgoose May 5, 2025

Uh oh!

lrstewart May 5, 2025 •

edited

Loading

Uh oh!

maddeleine May 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		4. Client s2n_recv reads ApplicationData
		5. Client s2n_renegotiate sends ClientHello

tests: reduce integ test flakiness + improve debugability #5282

tests: reduce integ test flakiness + improve debugability #5282

Uh oh!

Conversation

lrstewart commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Summary:

Resolved issues:

Description of changes:

Call-outs:

Testing:

Uh oh!

lrstewart May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lrstewart May 5, 2025

Choose a reason for hiding this comment

Uh oh!

lrstewart May 5, 2025

Choose a reason for hiding this comment

Uh oh!

goatgoose May 5, 2025

Choose a reason for hiding this comment

Uh oh!

lrstewart May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maddeleine May 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lrstewart commented May 2, 2025 •

edited

Loading

lrstewart May 5, 2025 •

edited

Loading

lrstewart May 5, 2025 •

edited

Loading