_sim_debug_flush(): Use fsync().#454
Conversation
7df5086 to
e6a4d44
Compare
|
I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync? |
It can still be used for single threaded code, i.e., simulators that don't use AIO. Also, I don't have a good handle on which platforms SIMH is supposed to support (the Very happy to take it out and rewrite code to assume everyone has |
e6a4d44 to
2e6a571
Compare
@pkoning2: It's all just |
| * | ||
| * f == NULL: fwrite() returns 0 on Linux and Windows. len never gets decremented | ||
| * and an infinite loop ensues (cue "Forever NULL" sung to the tune of Alphaville's | ||
| * "Forever Young".) |
There was a problem hiding this comment.
Why not simply treat length written == 0 as an error? But in my testing, a NULL file pointer causes a segfault (on recent Linux as well as on Mac OS).
There was a problem hiding this comment.
@pkoning2: I left the "dragon" note as more of a "Don't Do This" warning. I added extra checks deal with the f == NULL,len_written == 0 and errno != EAGAIN cases. In each case, SIMH outputs a diagnostic message (the "why") and the buffer's contents that weren't written to the debug output.
af23c88 to
072fe53
Compare
Fix race condition in _debug_fwrite_all() where the FILE *f output file can be permanently NULL, causing fwrite() to permanently return zero and resulting in an infinite loop on mulithreaded SIMH. This pathology primarily occurs when main thread calls flush_svc() and one of the Ethernet threads (reader, writer) emits debugging output. Linux and macOS platforms invoke fsync() to synchronize file data to disk. Windows has a fsync() wrapper calling _commit() providing the same semantics. Issue traced to _sim_debug_flush() calling sim_set_deboff(0, NULL) to simulate fsync().
072fe53 to
43407a0
Compare
|
So I'm looking at this for inclusion in ZIMH and I can't understand why the old behavior of closing and reopening the stream made any sense or why an fsync() is needed over a simple fflush; this is for debugging logging after all, not a database where you need an atomic commit to durable storage. Can someone explain to me why this can't just be: /* Flush pending debug output. */
static t_stat _sim_debug_flush (void)
{
if (sim_deb == NULL) /* no debug? */
return SCPE_OK;
_sim_debug_write_flush ("", 0, TRUE);
if (fflush (sim_deb) != 0)
return SCPE_IOERR;
return SCPE_OK;
}If no one has a good explanation to the contrary, I think I'm just going to do it that way. fflush is nice and portable C after all. |
Fix the debug-output flush race described in open-simh/simh#454, but with a much simpler local implementation. The old debug flush path drained SIMH's internal debug buffer, then closed and reopened ordinary debug files to force the stdio buffers out. That temporarily cleared and replaced the global debug stream, which could race with other threads emitting debug output. Keep the internal debug-buffer flush, then call fflush() on the current debug stream. This preserves the useful flush behavior without closing or replacing sim_deb. Upstream-PR: open-simh/simh#454
|
@pmetzger: Like you, I've gone a different direction to re-implement in Rust, although I could be convinced to contribute to ZIMH. I had enough of the trolling. |
|
@bscottm My intention is to rebuild the entire thing in Rust once I have a full set of tests in place so that I feel confident that subsystems work as they're converted. Happy to discuss how the whole thing is to work; the discussions feature is open on the ZIMH repo. |
|
I've thought about it, and I went with just the fflush; the fsync didn't seem necessary because if the host machine crashes hard at that moment you don't have any real guarantees about where the simulated machine got to in a debugging run anyway. I can be argued out of that position, but it's better to do it in the ZIMH repo issues and discussions. |
|
Actually, the definition of fsync is that it commits both data and metadata. You may be thinking of fdatasync which (as you can tell from the name) is metadata only. |
|
Yes, but there is no need to sync a log file hard to the durable device (hard disk or ssd); that lowers system performance (including for unrelated processes!) unnecessarily for the very rare case where the system crashes during your work, and if it crashed right then, the probability that you care about the log that much is nil. fsync is there for when you're dong a database transaction or the like, and this is not a database transaction. fflushing'ing a log, on the other hand, is perfectly reasonable. So there's no need for the fsync, but an fflush is reasonable. So I fully cleaned up this code in my own fork just to fflush. |
|
I don't see any reason to call fsync, if the data is flushed and the application crashes, it will still get written to disk. fsync is designed to make sure disk is in sync with file, mostly for multi application access. |
Yes, that's true. fsync is needed if you're worried about OS crashes. For application issues fflush is sufficient. |
|
And this is an application issue; the goal is to get debugging information out, not to deal with rare host operating system crashes. You can clean this whole thing up into a very short routine that is much easier to understand. (I did.) |
Fix race condition in _debug_fwrite_all() where the FILE *f output file can be permanently NULL, causing fwrite() to permanently return zero and resulting in an infinite loop on mulithreaded SIMH. This pathology primarily occurs when main thread calls flush_svc() and one of the Ethernet threads (reader, writer) emits debugging output.
Linux, macOS invoke fsync(). Windows invokes a fsync() wrapper that invokes _commit().
Issue traced to _sim_debug_flush() calling sim_set_deboff(0, NULL) to simulate fsync().