Skip to content

PYTHON-5253 Automated Spec Test Sync #2409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

sleepyStick
Copy link
Contributor

@sleepyStick sleepyStick commented Jun 25, 2025

Generates at most one weekly PR on Monday's at 9am pst (assuming the evergreen hosts are in utc time...).

At a high level, I've written a script that simply loops through pymongo's existing spec tests and calls the existing bash resync-spec.sh to re-sync all the specs. Then if changes were made, a PR would be created. If all specs are up to date, then no PR will be made.

This PR will be created by mongodb-drivers-pr-bot with the changes. If an error occurred during the syncing of a spec, the PR description will display the name of the spec along with stderr from the bash resync-spec.sh <spec> command.

An example of a generated PR #2407

Copy link
Contributor Author

@sleepyStick sleepyStick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments that I think might help the reviewer :)

cd $SPECS/source
make
cd -
if ! [ -n "${CI:-}" ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is always defined in an EVG run and we only want to run these things in a local checkout.

#!/usr/bin/env bash

tools="../drivers-evergreen-tools"
git clone https://github.com/mongodb-labs/drivers-evergreen-tools.git $tools
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloning drivers evergreen tools because there are some scripts (mainly ./secrets-export.sh and get-access-token.sh that are there and i'd rather not copy it over)

and len(list(os.scandir(pathlib.Path(entry.path) / "tests"))) > 1
}
test_set = {entry.name.replace("-", "_") for entry in os.scandir(directory) if entry.is_dir()}
known_mappings = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a slight difference in how we name the directories vs how the spec names them sometimes? (Not 100% sure if these are correct though, so please let me know if these need to be changed!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we often have different names than the spec repo, that's expected.

if spec.name in ["asynchronous"]:
continue
process = subprocess.run(
["bash", "./.evergreen/resync-specs.sh", spec.name], # noqa: S603, S607
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note:

S603 `subprocess` call: check for execution of untrusted input
S607 Starting a process with a partial executable path

S603: I think spec.name is safe because they're only acquired from our directories within test
S607: I think bash is sufficient for executable (they want the whole path)

errored[spec.name] = process.stderr

process = subprocess.run(
["git diff --name-only | awk -F'/' '{print $2}' | sort | uniq"], # noqa: S607
Copy link
Contributor Author

@sleepyStick sleepyStick Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note:

S607 Starting a process with a partial executable path

I tried to separate the string into ["git", "diff --name-only | awk -F'/' '{print $2}' | sort | uniq"] but that caused the command to fail? So I kept it as a whole string..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each space-separated word has to be a separate arg, keeping it as one here is probably more readable.

@@ -0,0 +1,1327 @@
diff --git a/test/load_balancer/cursors.json b/test/load_balancer/cursors.json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a file of known spec test differences that we don't want to be committed. If any one of these is incorrect, or I'm missing some, lmk!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a section to CONTRIBUTING.md detailing the why and how of modifying this file. It should also contain the policy we follow summarized by Shane in the Slack thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Just added a blurb, but lmk if the language is unclear or anything like that. (I feel like my technical writing skills are lacking sometimes)

@sleepyStick sleepyStick marked this pull request as ready for review June 26, 2025 17:04
Copy link
Contributor

@NoahStapp NoahStapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some minor comments.

["bash", "./.evergreen/resync-specs.sh", spec.name], # noqa: S603, S607
capture_output=True,
text=True,
check=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind check=False here and then checking the returncode explicitly instead of using check=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err no motivation (I learned about check when ruff yelled at me for it so) Changed it to check=True inside of a try catch!

errored[spec.name] = process.stderr

process = subprocess.run(
["git diff --name-only | awk -F'/' '{print $2}' | sort | uniq"], # noqa: S607
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each space-separated word has to be a separate arg, keeping it as one here is probably more readable.

and len(list(os.scandir(pathlib.Path(entry.path) / "tests"))) > 1
}
test_set = {entry.name.replace("-", "_") for entry in os.scandir(directory) if entry.is_dir()}
known_mappings = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we often have different names than the spec repo, that's expected.

@sleepyStick sleepyStick requested a review from NoahStapp June 27, 2025 16:09
@@ -0,0 +1,1327 @@
diff --git a/test/load_balancer/cursors.json b/test/load_balancer/cursors.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a section to CONTRIBUTING.md detailing the why and how of modifying this file. It should also contain the policy we follow summarized by Shane in the Slack thread.

@sleepyStick sleepyStick requested a review from a team as a code owner June 27, 2025 19:00
@sleepyStick sleepyStick requested a review from NoahStapp June 27, 2025 19:27
Copy link
Contributor

@NoahStapp NoahStapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some wording updates to more clearly communicate the structure and purpose of our new patch files.

@@ -432,6 +432,32 @@ update in PyMongo. This is primarily helpful if you are implementing a
new feature in PyMongo that has spec tests already implemented, or if
you are attempting to validate new spec tests in PyMongo.

### Automated Specification Test Resyncing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should focus more on the process and what this automated syncing does, rather than the literal sequence of scripts and operations it performs.

a new ticket to unskip these.

#### Adding to a patch file
Assuming the changes are committed somewhere, to add to any of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unclear what this is saying. Can you provide a small example of what this should look like?

@sleepyStick sleepyStick requested a review from NoahStapp June 27, 2025 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants