-
Notifications
You must be signed in to change notification settings - Fork 728
tests: reduce integ test flakiness + improve debugability #5282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,8 +94,29 @@ def _fn( | |
# is nothing to capture here. | ||
pass | ||
finally: | ||
# Whether the processes succeeded or not, clean then up. | ||
for p in processes: | ||
# Always print the results | ||
if p.results: | ||
Comment on lines
+98
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay so bear with me: ManagedProcess is async. We don't print stdout / stderr until after the results for a process are marked ready. Once the results for one process are marked ready, a test can assert the success of that process. If that assertion fails, we may exit the test without printing stdout / stderr. Usually this just results in failure to print stdout / stderr for one process, but because we mark the results ready BEFORE doing the print, theoretically we could fail to print stdout / stderr for both processes. Rather than trying to cleanup that async mess, I moved the printing here. This logic should ALWAYS trigger, regardless of failed assertions. Handling teardown logic is one of the specific responsibilities of fixtures. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
width = 90 | ||
padchar = "#" | ||
|
||
print(padchar * width) | ||
print(f" {p.cmd_line[0]} ".center(width, padchar)) | ||
print(padchar * width) | ||
|
||
print(f"Command line:\n\t{' '.join(p.cmd_line)}") | ||
print(f"Exit code:\n\t {p.results.exit_code}") | ||
print("") | ||
|
||
print(" Stdout ".center(width, padchar)) | ||
print(p.results.stdout.decode("utf-8", "backslashreplace")) | ||
print("") | ||
|
||
print(" Stderr ".center(width, padchar)) | ||
print(p.results.stderr.decode("utf-8", "backslashreplace")) | ||
print("") | ||
|
||
# Whether the processes succeeded or not, clean them up. | ||
if aborted: | ||
p.kill() | ||
else: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -191,6 +191,7 @@ def basic_reneg_test( | |
provider, | ||
messages=RENEG_MESSAGES, | ||
reneg_option=None, | ||
timeout=8, | ||
): | ||
options = ProviderOptions( | ||
port=next(available_ports), | ||
|
@@ -217,15 +218,15 @@ def basic_reneg_test( | |
provider, | ||
server_options, | ||
send_marker=Msg.send_markers(messages, Provider.ServerMode), | ||
timeout=8, | ||
timeout=timeout, | ||
) | ||
|
||
s2n_client = managed_process( | ||
S2N, | ||
client_options, | ||
send_marker=Msg.send_markers(messages, Provider.ClientMode), | ||
close_marker=Msg.close_marker(messages), | ||
timeout=8, | ||
timeout=timeout, | ||
) | ||
|
||
return (s2n_client, server) | ||
|
@@ -422,6 +423,49 @@ def test_s2n_client_renegotiate_with_client_auth_with_openssl( | |
|
||
""" | ||
The s2n-tls client successfully reads ApplicationData during the renegotiation handshake. | ||
|
||
This test is best-effort, and flakiness could be a sign of problems with how | ||
s2n-tls handles ApplicationData during the renegotiation handshake. | ||
|
||
Ideally the actions in the test would be: | ||
1. Server sends the HelloRequest | ||
2. Client s2n_recv reads HelloRequest | ||
3. Client s2n_renegotiate sends ClientHello | ||
4. Server sends ApplicationData | ||
5. Server reads ClientHello, begins new handshake | ||
6. Server sends ServerHello | ||
7. Client s2n_renegotiate reads ApplicationData | ||
|
||
Notice that the ApplicationData appears between two handshake messages, | ||
so is read by s2n_renegotiate instead of s2n_recv. That is the scenario | ||
we are attempting to test. | ||
|
||
But that means that the actions might occur in this order: | ||
1. Server sends the HelloRequest | ||
2. Server sends ApplicationData | ||
3. Client s2n_recv reads HelloRequest | ||
4. Client s2n_recv reads ApplicationData | ||
5. Client s2n_renegotiate sends ClientHello | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is why I had to remove the |
||
6. Server reads ClientHello, begins new handshake | ||
7. Server sends ServerHello | ||
|
||
You might think we could avoid this scenario by waiting to send the server | ||
ApplicationData until the server reads the ClientHello. However, at that point | ||
the server cannot send any ApplicationData because it has begun a new handshake. | ||
There is no other viable trigger for the ApplicationData, which is why this test | ||
is only best-effort. | ||
|
||
The actions might also occur in this order: | ||
1. Server sends the HelloRequest | ||
2. Client s2n_recv reads HelloRequest | ||
3. Client s2n_renegotiate sends ClientHello | ||
4. Server reads ClientHello, begins new handshake | ||
5. Server sends ServerHello | ||
6. ... rest of the handshake... | ||
6. Server sends ApplicationData | ||
7. Client s2n_recv reads ApplicationData | ||
We can make this scenario less likely by forcing the client to wait a few seconds | ||
before sending the ClientHello, but it is still theoretically possible. | ||
""" | ||
|
||
|
||
|
@@ -440,7 +484,6 @@ def test_s2n_client_renegotiate_with_app_data_with_openssl( | |
protocol, | ||
provider, | ||
): | ||
first_server_app_data = Msg.expected_output(RENEG_MESSAGES, Provider.ClientMode)[0] | ||
(s2n_client, server) = basic_reneg_test( | ||
managed_process, | ||
cipher, | ||
|
@@ -449,6 +492,7 @@ def test_s2n_client_renegotiate_with_app_data_with_openssl( | |
protocol, | ||
provider, | ||
reneg_option=S2N_RENEG_WAIT, | ||
timeout=30, # Set the timeout much higher than the wait | ||
) | ||
|
||
for results in server.get_results(): | ||
|
@@ -463,10 +507,3 @@ def test_s2n_client_renegotiate_with_app_data_with_openssl( | |
for output in Msg.expected_output(RENEG_MESSAGES, Provider.ClientMode): | ||
assert output in results.stdout | ||
assert renegotiate_was_successful(results) | ||
stdout_str = str(results.stdout) | ||
|
||
# In order to test the case where application data is received during renegotiation, | ||
# we must verify that the data was received after renegotiation started but before the new handshake finished. | ||
reneg_starts = stdout_str.find(S2N_RENEG_START_MARKER) | ||
reneg_finishes = stdout_str.find(S2N_RENEG_SUCCESS_MARKER) | ||
assert to_marker(first_server_app_data) in stdout_str[reneg_starts:reneg_finishes] | ||
Comment on lines
-468
to
-472
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we can't reliably assert that we're testing what we think we are, I'm not sure I see much value in keeping the test. If we cause this test to break and it only results in flakiness, it seems unlikely that we'll actually catch the issue? Although I guess it doesn't hurt to keep the test anyway? I wonder how often it's testing the wrong behavior though. Like, could we keep running this test until this assertion is met, and fail the test if it isn't met after a few runs or something? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, it basically never failed in CI before this week. But with So the answer is that it usually works under normal conditions. But when the conditions get weird, the order of messages can get weird, and it has a high chance of failing.
That would make sense to me if there was a well-defined X percent chance of failure so we could choose the number of test runs such that the actual failure chance is known and acceptably small. But this is entirely "whims of the network". The assertion could fail 100% of the time. I'd be fine with just deleting the test though. @maddeleine thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I don't like this. We're essentially flying blind without these asserts. This is unit tested so we should just delete the test. |
Uh oh!
There was an error while loading. Please reload this page.