Skip to content

fix(evaluator): distinguish interrupted and failed sigterm exits#882

Draft
ngoncharenko wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
ngoncharenko:ngoncharenko/sigterm-handling
Draft

fix(evaluator): distinguish interrupted and failed sigterm exits#882
ngoncharenko wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
ngoncharenko:ngoncharenko/sigterm-handling

Conversation

@ngoncharenko
Copy link
Copy Markdown

Summary

  • add graceful SIGTERM interruption markers and configurable shutdown timeout handling in nemo-evaluator
  • distinguish adapter server fatal shutdowns from external interruptions so server-triggered SIGTERM exits nonzero instead of looking successful
  • update local, SLURM, and Lepton launcher flows to treat interrupted runs as non-success and skip auto-export
  • add unit coverage for shutdown handling and launcher classification paths

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@wprazuch
Copy link
Copy Markdown
Contributor

wprazuch commented Apr 3, 2026

Hey @ngoncharenko !
The code looks good to me, but I would like to give it a try with some real-world use cases before approving. ETA for this is Tuesday next week (Monday is bank holiday in Poland). Sorry for waiting, and once again thanks for the contribution! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants