[v2] Gather crash reports

- [x] Functional test (those things tends to be flaky, we need to automatically test that crashes are caught).
- [ ] should] Targeted at the binary Docker deployments and internal testing there should be an option of letting the OS dump the `core` instead of manually collecting the stack trace. No need to upload the `core` anywhere, at least not withing this issue timeframe: the testing developers will get it manually from the image.
- [x] Handle Windows exceptions ([example](https://docs.microsoft.com/en-us/windows/desktop/Debug/using-a-vectored-exception-handler), [_EXCEPTION_POINTERS](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-_exception_pointers), [ru](https://habr.com/company/xakep/blog/260577/), [StackWalker](https://github.com/JochenKalmbach/StackWalker#displaying-the-callstack-of-an-exception)). This should speed up the development by allowing me to test the crash handling code from my primary development environments. And will work for the Windows deployments of MM. => [POC](https://github.com/artemii235/SuperNET/pull/118).
- [x] -> A full Windows build is likely necessary. Otherwise we'll be getting linking errors with things like `os_portable::OS_init` missing.
- [x] -> -> Try to link the C MM1 library to the MM2 Rust binary instead of the other way around.
- [x] Handle Unix signals (having the Linux and macOS deployments in mind).
- [x] should] Capture not just the C crashes but also the Rust panics. => Global panic handlers aren't stable yet ([`#[panic_handler]`](https://github.com/rust-lang/rust/pull/53619)). We could set thread-local hooks, but I'll try a simpler (?) route first of using `RUST_BACKTRACE` and getting crash reports by capturing the standard output.
- [ ] -> Dump C/Rust backtraces to standard output and _save-to-share_ log.
- [ ] -> -> Rehash invariants regarding the logging of sensitive information.
- [x] Backtrace without line numbers first, to improve reliability? => Not an option.
- [ ] Scan the logs folder and see if some of the logs might constitute a crash or a failure.
- [ ] won't] Watchdog. Mark the log _dirty_ when we're doing something and _clean_ when we're finished. That way we can know that there was a failure even if we were too dead to capture the backtrace in the log. We'll probably need a helper, a forked process that leaves a trace that the computer and filesystem were online. If the system was online but MM failed to leave the _clean_ mark then something went wrong and the log might help us figure it out. It's important to leave the _clean_ mark whenever MM is killed. Won't do it in this issue, but should probably create a separate one.
- [x] Make sure it works on macOS, HyperDEX team mostly deploys there.
- [x] -> Test the macOS build?
- [x] -> CI macOS build?
- [x] -> Remove the `strip` from the builds, turn debug options on.
- [x] Figure out where to send/store them. => I'd like to use the [Google Cloud Storage](https://cloud.google.com/free/docs/always-free-usage-limits#gcs_name) for that. Should ask Artem for alternatives. => For starters it might be good enough if they're printed to stdout.
- [x] Consider implementing this for MM1 too. => The plan is to make a new release of MM reusing the same CI build chain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v2] Gather crash reports #117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[v2] Gather crash reports #117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions