-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 #16573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request was exported from Phabricator. Differential Revision: D72945995 |
ee5e5af
to
f0eb873
Compare
f0eb873
to
0f15b8d
Compare
This pull request was exported from Phabricator. Differential Revision: D72945995 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D72945995 |
0f15b8d
to
be6baee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if not envs.VLLM_DISABLE_COMPILE_CACHE: | ||
assert hash_str is not None, ( | ||
"failed to get the hash of the compiled graph") | ||
assert file_path is not None, ( | ||
"failed to get the file path of the compiled graph") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this to @houseroad offline, but the implication here is:
- if hash_str is None and envs.VLLM_DISABLE_COMPILE_CACHE is True, then this implies that the vLLM monkey-patched hijack_compiled_fx_graph_hash never ran
- if hijack_compiled_fx_graph_hash never ran, but compile_fx produced a graph, then this implies that we are actually relying on torch.compile's caching logic!
The overall inference is that when VLLM_DISABLE_COMPILE_CACHE is on, we can be relying on torch.compile's cache. This doesn't seem right to me (or at least it is confusing).
I think the right fix is something like disabling the remote caches (#16611) so we don't get into this state.
Thoughts, @youkaichao? I'm also happy to document this behavior and ship this PR as-is but I will potentially dig it out later in the standalone_compile refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right fix is something like disabling the remote caches (#16611) so we don't get into this state.
agree, let's disable remote cache then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with the above point, #16611 is the right solution. I think it still makes sense to add this change, my point is that we should have some overall switch to disable all the logic related to torch compile cache, which is VLLM_DISABLE_COMPILE_CACHE.
For some internal use case, with VLLM_DISABLE_COMPILE_CACHE=1, we still fail on the compile cache related checks, which doesn't really make sense to me.
be6baee
to
63c3e23
Compare
63c3e23
to
fb0443a
Compare
Summary: Pull Request resolved: vllm-project#16573 Enable Llama4 in the fbcode vllm. Reviewed By: henryoier Differential Revision: D72945995 Signed-off-by: Lu Fang <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
fb0443a
to
7a8247b
Compare
…E=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]>
…E=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]>
…E=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>
…E=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Mu Huai <[email protected]>
…E=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
…E=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]> Signed-off-by: minpeter <[email protected]>
Disable the checks if it's not used.
Run the benchmark script to test
Output