-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Speed up cache size command #17015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up cache size command #17015
Conversation
Cargo.toml
Outdated
| zeroize = { version = "1.8.1" } | ||
| zip = { version = "2.2.3", default-features = false, features = ["deflate", "zstd", "bzip2", "lzma", "xz"] } | ||
| zstd = { version = "0.13.3" } | ||
| diskus = { git = "https://github.com/sharkdp/diskus", version = "0.8.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add Git dependencies, it'll break our crates.io publish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, happy for me to copy most of David's code over then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure yet.
@sharkdp are you interested in publishing the crate? Do you think we should just vendor the parts we need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just vendor any speed-ups because this is pulling in a bunch of new dependencies (e.g., a second version of Clap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated diskus today (it has always been published on crates.io, so no need for Git dependencies). You should now be able to pull it in as a relatively lightweight dependency using default-features = false. I also made an update specifically for counting apparent file size (which is what uv did here before: sum up metadata.len()), to make that more consistent with what du -sb does (exclude the size of directory entries themselves). I also cleaned up the diskus API a bit. If you want to depend on it here, the code should be something like:
let result = DiskUsage::new(&[cache.root()]).apparent_size().count();
let total_bytes = result.ignore_errors().size_in_bytes();(or leave out the call to .apparent_size() if you'd rather want to count disk usage).
Note that diskus uses a heuristic for the default number of workers which should ideally not be overwritten for the best possible performance (at least according to benchmarks I did years ago).
I am also completely fine with uv vendoring diskus. It's not a lot of code. In that case, you should potentially make those two updates mentioned above, though.
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.9.17` -> `0.9.18` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.9.18`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0918) [Compare Source](astral-sh/uv@0.9.17...0.9.18) Released on 2025-12-16. ##### Enhancements - Add value hints to command line arguments to improve shell completion accuracy ([#​17080](astral-sh/uv#17080)) - Improve error handling in `uv publish` ([#​17096](astral-sh/uv#17096)) - Improve rendering of multiline error messages ([#​17132](astral-sh/uv#17132)) - Support redirects in `uv publish` ([#​17130](astral-sh/uv#17130)) - Include Docker images with the alpine version, e.g., `python3.x-alpine3.23` ([#​17100](astral-sh/uv#17100)) ##### Configuration - Accept `--torch-backend` in `[tool.uv]` ([#​17116](astral-sh/uv#17116)) ##### Performance - Speed up `uv cache size` ([#​17015](astral-sh/uv#17015)) - Initialize S3 signer once ([#​17092](astral-sh/uv#17092)) ##### Bug fixes - Avoid panics due to reads on failed requests ([#​17098](astral-sh/uv#17098)) - Enforce latest-version in `@latest` requests ([#​17114](astral-sh/uv#17114)) - Explicitly set `EntryType` for file entries in tar ([#​17043](astral-sh/uv#17043)) - Ignore `pyproject.toml` index username in lockfile comparison ([#​16995](astral-sh/uv#16995)) - Relax error when using `uv add` with `UV_GIT_LFS` set ([#​17127](astral-sh/uv#17127)) - Support file locks on ExFAT on macOS ([#​17115](astral-sh/uv#17115)) - Change schema for `exclude-newer` into optional string ([#​17121](astral-sh/uv#17121)) ##### Documentation - Drop arm musl caveat from Docker documentation ([#​17111](astral-sh/uv#17111)) - Fix version reference in resolver example ([#​17085](astral-sh/uv#17085)) - Better documentation for `exclude-newer*` ([#​17079](astral-sh/uv#17079)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi41Ny4xIiwidXBkYXRlZEluVmVyIjoiNDIuNTcuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Summary
uv cache sizecan be quite slow. Here i use https://github.com/sharkdp/diskus to walk the cache directory with in multiple threads.Add cli option to set the number of threads and default to
std::thread::available_parallelism()or 1.Test Plan
Added cli statement with info log test.
I believe this is a fair test, where i set cache dir to a large directory.