Skip to content

Fixed Python TcpipSocketStream write loop locking up on unhandled socket errors#2865

Merged
mcosgriff merged 2 commits into
mainfrom
2856-interface-locking-up-when-encountering-unchecked-socket-errno-in-python
Feb 24, 2026
Merged

Fixed Python TcpipSocketStream write loop locking up on unhandled socket errors#2865
mcosgriff merged 2 commits into
mainfrom
2856-interface-locking-up-when-encountering-unchecked-socket-errno-in-python

Conversation

@mcosgriff
Copy link
Copy Markdown
Contributor

  • Fixed an infinite loop in TcpipSocketStream.write() where an OSError with an error number other than EAGAIN/EWOULDBLOCK (e.g. ECONNRESET, EPIPE) was silently swallowed, causing the interface to lock up indefinitely with no way to recover short of manually restarting the microservice
  • Added else: raise to re-raise the original OSError immediately, matching the behavior of the Ruby implementation which only rescues EAGAIN/EWOULDBLOCK and lets all other errors bubble up naturally

Fixes #2856

…IN/EWOULDBLOCK, the exception was silently swallowed and the while True loop would spin indefinitely, requiring a manual microservice

restart to recover. Add an else: raise to propagate the original OSError immediately, matching the Ruby implementation which naturally bubbles up
any exception not explicitly rescued.

Also port the previously commented-out Ruby write tests to Python and add a regression test for the lock-up scenario. Suppress a valkey internal DeprecationWarning in the test suite that is not actionable from our code.
@mcosgriff mcosgriff requested a review from jmthomas February 24, 2026 20:28
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.74%. Comparing base (098b373) to head (eaf0ea9).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2865      +/-   ##
==========================================
+ Coverage   78.72%   78.74%   +0.01%     
==========================================
  Files         667      667              
  Lines       54496    54497       +1     
  Branches      731      731              
==========================================
+ Hits        42904    42913       +9     
+ Misses      11512    11504       -8     
  Partials       80       80              
Flag Coverage Δ
python 80.71% <ø> (+0.04%) ⬆️
ruby-api 80.26% <ø> (+0.04%) ⬆️
ruby-backend 82.07% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…subclass of OSError in Python, so the interface's exception handling will still catch it and trigger a reconnect
@sonarqubecloud
Copy link
Copy Markdown

Comment thread openc3/python/openc3/streams/tcpip_socket_stream.py
@mcosgriff mcosgriff marked this pull request as ready for review February 24, 2026 22:57
@mcosgriff mcosgriff merged commit 640782d into main Feb 24, 2026
24 of 25 checks passed
@mcosgriff mcosgriff deleted the 2856-interface-locking-up-when-encountering-unchecked-socket-errno-in-python branch February 24, 2026 22:58
jmthomas pushed a commit that referenced this pull request Mar 21, 2026
…ket errors (#2865)

* When the write socket raised an OSError with an error other than EAGAIN/EWOULDBLOCK, the exception was silently swallowed and the while True loop would spin indefinitely, requiring a manual microservice
restart to recover. Add an else: raise to propagate the original OSError immediately, matching the Ruby implementation which naturally bubbles up
any exception not explicitly rescued.

Also port the previously commented-out Ruby write tests to Python and add a regression test for the lock-up scenario. Suppress a valkey internal DeprecationWarning in the test suite that is not actionable from our code.

* Raise a TimeoutError instead of RuntimeError. TimeoutError is also a subclass of OSError in Python, so the interface's exception handling will still catch it and trigger a reconnect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interface Locking Up when Encountering Unchecked Socket Errno in Python

2 participants