_sim_debug_flush(): Use fsync(). by bscottm · Pull Request #454 · open-simh/simh

bscottm · 2025-03-17T20:34:32Z

Fix race condition in _debug_fwrite_all() where the FILE *f output file can be permanently NULL, causing fwrite() to permanently return zero and resulting in an infinite loop on mulithreaded SIMH. This pathology primarily occurs when main thread calls flush_svc() and one of the Ethernet threads (reader, writer) emits debugging output.

Linux, macOS invoke fsync(). Windows invokes a fsync() wrapper that invokes _commit().

Issue traced to _sim_debug_flush() calling sim_set_deboff(0, NULL) to simulate fsync().

bscottm · 2025-03-17T20:43:39Z

@pkoning2: Please merge PR #450 so that CI/CD can get past the LTO issue so that this and other builds have some chance at success (barring simulator test failures.)

pkoning2 · 2025-03-17T21:28:53Z

I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync?

bscottm · 2025-03-17T22:32:11Z

I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync?

It can still be used for single threaded code, i.e., simulators that don't use AIO.

Also, I don't have a good handle on which platforms SIMH is supposed to support (the makefile seems to support a lot more than just Linux, macOS and Windows... Haiku? SunOS variants?) So, I left it in there as CYA.

Very happy to take it out and rewrite code to assume everyone has fsync(), except for Windows, which needs the diaper.

bscottm · 2025-03-18T15:39:23Z

I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync?

@pkoning2: It's all just fsync now. If there's a platform out there that needs to be supported (VMS?), an issue is probably the best way to address it. Now back to the ETH_MAC * discussion...

pkoning2 · 2025-08-17T20:10:31Z

+ * 
+ * f == NULL: fwrite() returns 0 on Linux and Windows. len never gets decremented
+ * and an infinite loop ensues (cue "Forever NULL" sung to the tune of Alphaville's
+ * "Forever Young".)


Why not simply treat length written == 0 as an error? But in my testing, a NULL file pointer causes a segfault (on recent Linux as well as on Mac OS).

@pkoning2: I left the "dragon" note as more of a "Don't Do This" warning. I added extra checks deal with the f == NULL,len_written == 0 and errno != EAGAIN cases. In each case, SIMH outputs a diagnostic message (the "why") and the buffer's contents that weren't written to the debug output.

Fix race condition in _debug_fwrite_all() where the FILE *f output file can be permanently NULL, causing fwrite() to permanently return zero and resulting in an infinite loop on mulithreaded SIMH. This pathology primarily occurs when main thread calls flush_svc() and one of the Ethernet threads (reader, writer) emits debugging output. Linux and macOS platforms invoke fsync() to synchronize file data to disk. Windows has a fsync() wrapper calling _commit() providing the same semantics. Issue traced to _sim_debug_flush() calling sim_set_deboff(0, NULL) to simulate fsync().

pmetzger · 2026-04-29T01:55:42Z

So I'm looking at this for inclusion in ZIMH and I can't understand why the old behavior of closing and reopening the stream made any sense or why an fsync() is needed over a simple fflush; this is for debugging logging after all, not a database where you need an atomic commit to durable storage.

Can someone explain to me why this can't just be:

/* Flush pending debug output. */
static t_stat _sim_debug_flush (void)
{
    if (sim_deb == NULL)                                /* no debug? */
        return SCPE_OK;

    _sim_debug_write_flush ("", 0, TRUE);

    if (fflush (sim_deb) != 0)
        return SCPE_IOERR;

    return SCPE_OK;
}

If no one has a good explanation to the contrary, I think I'm just going to do it that way. fflush is nice and portable C after all.

Fix the debug-output flush race described in open-simh/simh#454, but with a much simpler local implementation. The old debug flush path drained SIMH's internal debug buffer, then closed and reopened ordinary debug files to force the stdio buffers out. That temporarily cleared and replaced the global debug stream, which could race with other threads emitting debug output. Keep the internal debug-buffer flush, then call fflush() on the current debug stream. This preserves the useful flush behavior without closing or replacing sim_deb. Upstream-PR: open-simh/simh#454

bscottm · 2026-05-01T18:38:16Z

@pmetzger: fsync() is a hard gaurantee that data is sync-ed to the file system, whereas fflush() doesn't make that gaurantee. (Note: fsync() doesn't guarantee that file system metadata is sync-ed, only the data.)

Like you, I've gone a different direction to re-implement in Rust, although I could be convinced to contribute to ZIMH. I had enough of the trolling.

pmetzger · 2026-05-01T22:48:07Z

@bscottm My intention is to rebuild the entire thing in Rust once I have a full set of tests in place so that I feel confident that subsystems work as they're converted. Happy to discuss how the whole thing is to work; the discussions feature is open on the ZIMH repo.

pmetzger · 2026-05-01T22:49:12Z

I've thought about it, and I went with just the fflush; the fsync didn't seem necessary because if the host machine crashes hard at that moment you don't have any real guarantees about where the simulated machine got to in a debugging run anyway. I can be argued out of that position, but it's better to do it in the ZIMH repo issues and discussions.

pkoning2 · 2026-05-03T18:23:36Z

Actually, the definition of fsync is that it commits both data and metadata. You may be thinking of fdatasync which (as you can tell from the name) is metadata only.
fflush is something different, it writes buffered data to the device for stdio type buffered I/O. The Mac OS documention does not say, but the Linux one does, that the two are separate and a complete commit to disk requires fflush followed by fsync.

pmetzger · 2026-05-03T20:27:13Z

Yes, but there is no need to sync a log file hard to the durable device (hard disk or ssd); that lowers system performance (including for unrelated processes!) unnecessarily for the very rare case where the system crashes during your work, and if it crashed right then, the probability that you care about the log that much is nil. fsync is there for when you're dong a database transaction or the like, and this is not a database transaction. fflushing'ing a log, on the other hand, is perfectly reasonable. So there's no need for the fsync, but an fflush is reasonable. So I fully cleaned up this code in my own fork just to fflush.

rcornwell · 2026-05-03T20:54:37Z

I don't see any reason to call fsync, if the data is flushed and the application crashes, it will still get written to disk. fsync is designed to make sure disk is in sync with file, mostly for multi application access.

pkoning2 · 2026-05-03T21:15:25Z

I don't see any reason to call fsync, if the data is flushed and the application crashes, it will still get written to disk. fsync is designed to make sure disk is in sync with file, mostly for multi application access.

Yes, that's true. fsync is needed if you're worried about OS crashes. For application issues fflush is sufficient.

pmetzger · 2026-05-04T19:33:15Z

And this is an application issue; the goal is to get debugging information out, not to deal with rare host operating system crashes. You can clean this whole thing up into a very short routine that is much easier to understand. (I did.)

bscottm force-pushed the debug_flush_loop branch from 7df5086 to e6a4d44 Compare March 17, 2025 20:37

bscottm force-pushed the debug_flush_loop branch from e6a4d44 to 2e6a571 Compare March 18, 2025 00:23

pkoning2 reviewed Aug 17, 2025

View reviewed changes

bscottm force-pushed the debug_flush_loop branch 3 times, most recently from af23c88 to 072fe53 Compare August 21, 2025 16:28

bscottm force-pushed the debug_flush_loop branch from 072fe53 to 43407a0 Compare August 21, 2025 19:53

Conversation

bscottm commented Mar 17, 2025

Uh oh!

bscottm commented Mar 17, 2025

Uh oh!

pkoning2 commented Mar 17, 2025

Uh oh!

bscottm commented Mar 17, 2025

Uh oh!

bscottm commented Mar 18, 2025

Uh oh!

pkoning2 Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

bscottm Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmetzger commented Apr 29, 2026

Uh oh!

bscottm commented May 1, 2026

Uh oh!

pmetzger commented May 1, 2026

Uh oh!

pmetzger commented May 1, 2026

Uh oh!

pkoning2 commented May 3, 2026

Uh oh!

pmetzger commented May 3, 2026

Uh oh!

rcornwell commented May 3, 2026

Uh oh!

pkoning2 commented May 3, 2026

Uh oh!

pmetzger commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bscottm Aug 21, 2025 •

edited

Loading