Skip to content

feat: add restart-safety coverage for put_writes idempotency#7237

Open
Anthony James Padavano (4444J99) wants to merge 2 commits intolangchain-ai:mainfrom
4444J99:feat/restart-safety-put-writes
Open

feat: add restart-safety coverage for put_writes idempotency#7237
Anthony James Padavano (4444J99) wants to merge 2 commits intolangchain-ai:mainfrom
4444J99:feat/restart-safety-put-writes

Conversation

@4444J99
Copy link
Copy Markdown

Summary

Adds test_put_writes_idempotent_across_restart to the checkpoint conformance suite, closing the contract-validation gap described in #7201.

What changed

libs/checkpoint-conformance/.../spec/test_put_writes.py

  • New test: test_put_writes_idempotent_across_restart — creates a checkpoint, writes pending data, re-opens the saver via factory, replays the exact same write, and asserts no duplication
  • Graceful fallback: when no factory is available, falls back to same-instance verification (matching existing test_put_writes_idempotent)
  • run_put_writes_tests now accepts optional saver_factory parameter

libs/checkpoint-conformance/.../validate.py

  • Passes registered.factory to the put_writes runner when running PUT_WRITES capability tests

Design decisions

  • Backward compatible: The new saver_factory parameter is optional. Existing runners that don't pass it still work — the test falls back to same-instance verification.
  • Minimal interface change: Only the put_writes runner signature changes (new optional kwarg). No new capabilities, no new test files.
  • Factory reuse: Uses the same factory from RegisteredCheckpointer that validate() already has access to, rather than introducing a new abstraction.

Addresses

Closes #7201

Test Plan

  • Existing conformance tests pass unchanged (no factory = fallback mode)
  • InMemorySaver: restart test passes (stateless, so replay is a no-op)
  • PostgresSaver: restart test verifies ON CONFLICT idempotency across connections
  • SQLiteSaver: restart test verifies UPSERT idempotency across file reopens

Add test_put_writes_idempotent_across_restart to the conformance suite.
When a saver_factory is provided, the test re-opens the saver and
replays the same pending writes — verifying idempotency survives
process restarts, not just same-instance duplicates.

Without a factory, gracefully falls back to same-instance verification.

The validate runner now passes the registered factory to the
put_writes runner, enabling full restart-safety coverage.

Addresses langchain-ai#7201

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83d11b763a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

}
# Pass factory for restart-safety tests
if cap == Capability.PUT_WRITES:
runner_kwargs["saver_factory"] = registered.factory
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate restart check to avoid false failures on fresh factories

Passing registered.factory unconditionally for PUT_WRITES forces test_put_writes_idempotent_across_restart down its restart branch for every checkpointer, so the fallback path is never used during normal validate() runs. For factories that intentionally return a fresh in-memory saver each time (e.g. memory_checkpointer in libs/checkpoint-conformance/tests/test_validate_memory.py), the reopened saver cannot load the first instance’s checkpoint, causing a failure unrelated to idempotent duplicate writes. This turns a compatibility test into a stricter persistence requirement and can regress existing conformant implementations.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — factory is now only passed to the PUT_WRITES runner when the checkpointer registers a non-noop lifespan (the signal for persistent storage). In-memory checkpointers skip the restart branch entirely.

Comment on lines +294 to +295
gen = saver_factory()
saver2 = await gen.__anext__()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Simulate restart by closing first saver before reopening

The new restart test opens saver2 while saver is still alive, so it validates concurrent-instance behavior instead of a true restart sequence. Backends that enforce single-writer access or hold locks per live instance can fail here even if replaying writes after an actual restart is correct, which introduces false negatives in the conformance suite.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — saver is now closed before opening saver2 to properly simulate a process restart rather than concurrent access.

@4444J99
Copy link
Copy Markdown
Author

Checking in on this — the PR adds a restart-safety idempotency test to the checkpoint conformance suite, covering the contract gap described in #7201. The test verifies that put_writes produces identical results when replayed after a simulated process restart. Happy to iterate on any feedback.

Non-persistent (in-memory) factories return fresh empty instances, so
the restart branch cannot load the first instance's checkpoint.  Only
pass saver_factory when the registered checkpointer has a custom
lifespan (indicating shared external storage).  Also close the
original saver before opening saver2 to simulate a true process
restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

This PR has been automatically closed because you are not assigned to the linked issue.

External contributors must be assigned to an issue before opening a PR for it. Please:

  1. Comment on the linked issue to request assignment from a maintainer
  2. Once assigned, edit your PR description and the PR will be reopened automatically

Maintainers: reopen this PR or remove the missing-issue-link label to bypass this check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bypass-issue-check Maintainer override: skip issue-link enforcement external

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add restart-safety coverage for put_writes idempotency

2 participants