Skip to content

fix: passing cache tests when WDK config is enabled #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 18, 2025

Conversation

leon-xd
Copy link
Contributor

@leon-xd leon-xd commented Apr 4, 2025

Overview

Currently, our automated test matrix is not testing bindings when the WDK configuration is enabled at the project level. Locally, @wmmc88 discovered that the following wdk-macros tests failed when enabling the KMDF configuration:

  • tests::get_wdf_function_info_map::valid_input_no_cache
  • tests::get_wdf_function_info_map::valid_input_cache_exists
  • tests::inputs::generate_derived_ast_fragments::valid_input
  • tests::inputs::generate_derived_ast_fragments::valid_input_with_no_arguments.

These all test the feature I developed in #295 .

Root cause

These tests failed with configuration turned on because of the wdk crate. We conditionally compile the abstractions contained in the wdf mod based on whether KMDF/UMDF is enabled in our config. Within these abstractions, we use the call_unsafe_wdf_function_binding macro several times to call the bindings to the WDK functions.

Generating and saving the cache is triggered by the call_unsafe_wdf_function_binding macro. I used the scratch crate in order to determine the location of our saved cache -- by creating a scratch directory called wdk_macros_ast_fragments, we can write, save, and read the cache in a location that is accessible throughout the build process.

However, turning on a WDK configuration conflicted with assumptions made when writing tests for the cache generation. When all root WDK configurations were disabled, there was no call to the call_unsafe_wdf_function_binding macro, since the methods that used this macro in the wdk crate weren't compiled. Therefore, tests were written expecting a completely clean scratch directory. This worked fine until this configuration was enabled, as the call_unsafe_wdf_function_binding macro was called, and the cache was generated before tests ran. This triggered an assertion which checked for a completely clean environment, causing the failure of each of the tests.

Solution

In order to fix the root cause of this issue, I clean out the environment before each test is ran. However, more issues arose thinking about this more.

This problem broke a few assumptions I made when initially writing the tests, leading me to rewrite a portion of both the test and cache generation code. My old implementation included a test.lock that tests relied on for synchronous access to the scratch directory, as acquiring the lock used in the actual code resulted in a double acquire, and hence a deadlock. However, because now I know other crates can mutate the scratch directory, having separate locks for tests and implementation is not a clear enough guarantee to prevent a data race (even though Cargo runs build & test separately, this didn't feel like a strong enough invariant to prevent).

This lead me to conditionally compile two different versions of the function get_wdf_function_info_map. The cfg(test) version assumes an lock has already been obtained, and the cfg(not(test) version uses either a shared or exclusive lock depending on whether it needs to read or write to the cache. Now, the tests obtain the same lock as the actual code, ensuring that no data race exists between the wdk crate and the wdk-macros tests.

Other issues

cargo clippy --all-features also had not been run with the configuration turned on, and many warnings prevented its completion when run locally. While this does not explicitly stop any of the pipeline checks we have today, when we eventually turn this on in our pipelines, these clippy warnings will have to be dealt with. Any #[allow] is a result of me silencing these warnings pre-emptively.

Update 4-10

Instead of conditionally compiling two different versions of get_wdf_function_info_map, I instead conditionally compile two different locations for the scratch crate. This allows us to keep most of the old logic.

I also raised #331 as a result of tinkering with running cargo build/clippy --all-features with the UMDF config enabled.

@leon-xd leon-xd requested review from Copilot and wmmc88 and removed request for Copilot April 4, 2025 01:12
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

crates/wdk-macros/src/lib.rs:464

  • Consider using RAII (e.g., a guard type) for the exclusive file lock to ensure the lock is always released even if the closure panics.
FileExt::unlock(&flock).to_syn_result(span, "unable to unlock file lock")?;

crates/wdk-macros/src/lib.rs:482

  • Consider using RAII (e.g., a guard type) for the shared file lock to guarantee that the lock is released even if the closure panics.
FileExt::unlock(&flock).to_syn_result(span, "unable to unlock file lock")?;

Copy link
Collaborator

@wmmc88 wmmc88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new logic, you are conditionally changing the locking behavior of a single lock, depending on if the code executing was compiled for test or normally. I think this results in fairly messy code. It is not immediately obvious that what's going on here is:

  1. If cache doesnt exist and not in test, the lock is exclusively held (preventing tests from running)
  2. If the cache does exist and not in test, the lock is non-exclusively held (preventing tests from running but allowing normal proc-macro execution
  3. If the code is in test, regardless of if the cache exists or not, the lock is exclusively held and normal proc-macro execution is blocked

I think the reason the code feels unclear is largely because of how much the logic diverges between test-path and normal path, and I think we want to minimize that as much as possible. One way to do this is to revert the logic to how it was originally, but just have an extra test-specific lock. The normal path can acquire this "test" lock non-exclusively and the normal path does nothing.

@leon-xd leon-xd requested a review from wmmc88 April 9, 2025 18:41
@leon-xd leon-xd requested a review from gurry April 10, 2025 22:40
@leon-xd leon-xd requested review from wmmc88 and gurry April 17, 2025 03:27
@leon-xd leon-xd added this pull request to the merge queue Apr 18, 2025
Merged via the queue into microsoft:main with commit 790c824 Apr 18, 2025
62 checks passed
@leon-xd leon-xd deleted the fix-cache-tests branch April 18, 2025 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants