-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Fix double iteration bug when resumed from a checkpoint. #20775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
tests/tests_pytorch/loops/test_double_iter_in_iterable_dataset.py
Outdated
Show resolved
Hide resolved
@Borda Thanks for the review! Let me know if there's anything you'd like me to change. Otherwise, can we go ahead and merge this? |
tests/tests_pytorch/loops/test_double_iter_in_iterable_dataset.py
Outdated
Show resolved
Hide resolved
5c9a0fd
to
eb3a763
Compare
Signed-off-by: sudipto baral <[email protected]>
Signed-off-by: sudipto baral <[email protected]>
e451080
to
130102c
Compare
Hey, @Borda just wondering if anything is blocking this PR from merging? |
@property | ||
def _is_resuming(self) -> bool: | ||
"""Whether we're resuming training from a checkpoint.""" | ||
return self._loaded_from_state_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at how the _loaded_from_state_dict
is used, and there is no direct wrap for loading, meaning set it as true/false when loading starts, and switch back when it ends so for that we my need to add another/new attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for taking a look at it. I am not sure why we need that. _loaded_from_state_dict
is already made True and False when necessary so I am using _loaded_from_state_dict
to detect checkpoint resumption.
I might have missed something, could you please suggest what changes we need here?
self._loaded_from_state_dict = True |
pytorch-lightning/src/lightning/pytorch/loops/loop.py
Lines 102 to 105 in 1e88899
def on_iteration_done(self) -> None: | |
self._restarting = False | |
self._loaded_from_state_dict = False | |
self.reset_restart_stage() |
What does this PR do?
This PR fixes the double
iter()
bug discussed in #19427Fixes #19427
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--20775.org.readthedocs.build/en/20775/