Skip to content

DAOS-17737 dtx: handle race between DTX refresh and DTX abort #16535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Nasf-Fan
Copy link
Contributor

@Nasf-Fan Nasf-Fan commented Jun 24, 2025

If current transaction is aborted during dtx_refresh() yield by race, then return non-zero value to the sponsor to trigger client side RPC retry. That will make related transaction's status to be more clean.

More check after dtx_refresh() to avoid re-initializing aborted DTX.

The patch also cleanup the usage for vos_dtx_validation() to handle kinds of DTX abort (and maybe resent after that) cases.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Copy link

Ticket title is '"D_ASSERT(dth->dth_ent != NULL);" failure in dtx_handle_reinit()'
Status is 'In Progress'
https://daosio.atlassian.net/browse/DAOS-17737

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-17737 branch 3 times, most recently from 4029745 to 4b8a926 Compare June 26, 2025 15:36
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16535/4/testReport/

If current transaction is aborted during dtx_refresh() yield by race,
then return non-zero value to the sponsor to trigger client side RPC
retry. That will make related transaction's status to be more clean.

More check after dtx_refresh() to avoid re-initializing aborted DTX.

The patch also cleanup the usage for vos_dtx_validation() to handle
kinds of DTX abort (and maybe resent after that) cases.

Signed-off-by: Fan Yong <[email protected]>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-17737 branch from 4b8a926 to 95f7b81 Compare June 27, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants