Skip to content

Conversation

@konstin
Copy link
Member

@konstin konstin commented Oct 17, 2025

When a process is running and another calls uv cache clean or uv cache prune we currently deadlock - sometimes until the CI timeout (astral-sh/setup-uv#588). To avoid this, we add a default 5 min timeout waiting for a lock. 5 min balances allowing in-progress builds to finish, especially with larger native dependencies, while also giving timely errors for deadlocks on (remote) systems.

Commit 1 is a refactoring.

This branch also fixes a problem with the logging where acquired and released resources currently mismatch:

DEBUG Acquired lock for `https://github.com/tqdm/tqdm`
DEBUG Using existing Git source `https://github.com/tqdm/tqdm`
DEBUG Released lock at `C:\Users\Konsti\AppData\Local\uv\cache\git-v0\locks\16bb813afef8edd2`

@konstin konstin added the bug Something isn't working label Oct 17, 2025
@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 11:55 — with GitHub Actions Inactive
@konstin konstin force-pushed the konsti/locked-file-timeout branch from 2287b53 to 2a5b779 Compare October 17, 2025 12:48
@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 12:49 — with GitHub Actions Inactive
timeout: Duration,
) -> Option<Output> {
let (sender, receiver) = std::sync::mpsc::channel();
thread::spawn(move || {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should happen rarely and already involves waiting, so we can spawn a thread. I quickly looked into making it generally async but it didn't seem worth the churn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we only have a couple calls to the blocking / sync versions of the lock APIs. I'd be tempted to make them async.

I wonder if we should move this timeout handling to the API functions so we can run_with_timeout in the blocking / sync versions and just use an async timeout in the async versions? I'm wary of spawning a thread just for a timeout in the async case.

@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 13:52 — with GitHub Actions Inactive
@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 15:18 — with GitHub Actions Inactive
Comment on lines +15 to +34
/// Parsed value of `UV_LOCK_TIMEOUT`, with a default of 5 min.
static LOCK_TIMEOUT: LazyLock<Duration> = LazyLock::new(|| {
let default_timeout = Duration::from_secs(300);
let Some(lock_timeout) = env::var_os(EnvVars::UV_LOCK_TIMEOUT) else {
return default_timeout;
};

if let Some(lock_timeout) = lock_timeout
.to_str()
.and_then(|lock_timeout| lock_timeout.parse::<u64>().ok())
{
Duration::from_secs(lock_timeout)
} else {
warn!(
"Could not parse value of {} as integer: {:?}",
EnvVars::UV_LOCK_TIMEOUT,
lock_timeout
);
default_timeout
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have this to our standard environment variable parsing in EnvironmentOptions instead, I don't want to keep adding ad-hoc parsing like this.

If you want to defer it to reduce churn, that's okay — but we should add it the tracking issue and make sure it's moved.

Copy link
Member Author

@konstin konstin Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it to #14720, do you want a separate tracking issue?

I had looked into parsing this centrally but the locks are called in a lot of locations including e.g. a LazyLock in a Default impl (

match TextCredentialStore::read(&path) {
)

#[derive(Debug, Error)]
pub enum LockedFileError {
#[error(
"Timeout ({}s) when waiting for lock on `{}` at `{}`, is another uv process running? Set `{}` to increase the timeout.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to say "You can set ... to increase the timeout" instead of "Set" which makes it sounds like you should do that as the solution.

Comment on lines 239 to 273
// Write a test package that builds for a while
let child_pyproject_toml = context.temp_dir.child("pyproject.toml");
child_pyproject_toml.write_str(indoc! {r#"
[project]
name = "child"
version = "0.1.0"
requires-python = ">=3.9"
[build-system]
requires = []
backend-path = ["."]
build-backend = "build_backend"
"#})?;
// File to wait until the lock is acquired from starting the build.
let ready_file = context.temp_dir.child("ready_file.txt");
let build_backend = context.temp_dir.child("build_backend.py");
build_backend.write_str(&formatdoc! {r#"
import time
from pathlib import Path
Path(r"{}").touch()
# Make the test fail quickly if something goes wrong
time.sleep(10)
"#,
// Don't run tests in directories with double quotes, please.
ready_file.display(),
})?;

let mut child = context.pip_install().arg(".").spawn()?;

// Wait until we've acquired the lock in the first process.
while !ready_file.exists() {
std::thread::sleep(std::time::Duration::from_millis(1));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more complicated than it needs to be. We can just do

let _cache = uv_cache::Cache::from_path(context.cache_dir.path()).with_exclusive_lock();

Copy link
Member

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16342 (comment) is my main remaining caveat.

We should probably also add a note in https://docs.astral.sh/uv/concepts/cache since that's the main place this will be relevant.

@zanieb zanieb added enhancement New feature or improvement to existing functionality and removed bug Something isn't working labels Oct 22, 2025
@zanieb
Copy link
Member

zanieb commented Oct 22, 2025

On the timing, I guess I might expect something like 60s rather than 5m? 5m is nice and conservative though, we could reduce it later once we see that 5m doesn't break anything

) -> anyhow::Result<Vec<PathBuf>> {
let cache = Cache::from_path(temp_dir.child("cache").to_path_buf()).init()?;
let cache = Cache::from_path(temp_dir.child("cache").to_path_buf())
.init_no_wait()?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit more risky change because it assumes tests do not lock or spawn something in the background and then operate on Python versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is making every integration test async

@konstin konstin temporarily deployed to uv-test-registries October 27, 2025 12:58 — with GitHub Actions Inactive
@konstin konstin force-pushed the konsti/locked-file-timeout branch from e15037e to 762fbf6 Compare October 27, 2025 13:05
@konstin konstin temporarily deployed to uv-test-registries October 27, 2025 13:07 — with GitHub Actions Inactive
@konstin
Copy link
Member Author

konstin commented Oct 27, 2025

I rewrote it entirely async and removed the duplication between sync and async as well as shared and exclusive.

On the timing, I guess I might expect something like 60s rather than 5m? 5m is nice and conservative though, we could reduce it later once we see that 5m doesn't break anything

I can see some (e.g. Rust) build taking >60s, so I'd like to go with a higher timeout.

@konstin konstin temporarily deployed to uv-test-registries October 28, 2025 14:34 — with GitHub Actions Inactive
@konstin konstin requested a review from zanieb November 3, 2025 21:45
@konstin konstin force-pushed the konsti/locked-file-timeout branch from ac8cb01 to fce9233 Compare November 3, 2025 21:45
@konstin konstin temporarily deployed to uv-test-registries November 3, 2025 21:48 — with GitHub Actions Inactive
@konstin konstin force-pushed the konsti/locked-file-timeout branch from fce9233 to 9f4664d Compare November 3, 2025 21:52
@konstin
Copy link
Member Author

konstin commented Nov 3, 2025

I've rebased on top of dropping fs2

@konstin konstin temporarily deployed to uv-test-registries November 3, 2025 21:54 — with GitHub Actions Inactive
@konstin konstin force-pushed the konsti/locked-file-timeout branch 2 times, most recently from 3febf8f to 38699ba Compare November 19, 2025 12:54
@konstin konstin force-pushed the konsti/locked-file-timeout branch from 38699ba to 3e2d46e Compare December 2, 2025 15:18
@konstin konstin force-pushed the konsti/locked-file-timeout branch from 3e2d46e to 4d335c0 Compare December 4, 2025 11:32
@konstin konstin temporarily deployed to uv-test-registries December 4, 2025 11:35 — with GitHub Actions Inactive
When a process is running and another calls `uv cache clean` or `uv cache prune` we currently deadlock - sometimes until the CI timeout (astral-sh/setup-uv#588). To avoid this, we add a default 5 min timeout waiting for a lock. 5 min balances allowing in-progress builds to finish, especially with larger native dependencies, while also giving timely errors for deadlocks on (remote) systems.

Handle not found errors better

Windows fix

Fix windows

Update docs

Review

Simplify test case

Clippy

Use async and write docs

Update snapshot
@konstin konstin force-pushed the konsti/locked-file-timeout branch from 4d335c0 to a36e589 Compare December 4, 2025 11:54
@konstin konstin temporarily deployed to uv-test-registries December 4, 2025 11:56 — with GitHub Actions Inactive
@konstin konstin merged commit 62bf921 into main Dec 4, 2025
163 checks passed
@konstin konstin deleted the konsti/locked-file-timeout branch December 4, 2025 13:59
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Dec 12, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.9.13` -> `0.9.17` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.9.17`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0917)

[Compare Source](astral-sh/uv@0.9.16...0.9.17)

Released on 2025-12-09.

##### Enhancements

- Add `torch-tensorrt` and `torchao` to the PyTorch list ([#&#8203;17053](astral-sh/uv#17053))
- Add hint for misplaced `--verbose`  in `uv tool run` ([#&#8203;17020](astral-sh/uv#17020))
- Add support for relative durations in `exclude-newer` (a.k.a., dependency cooldowns) ([#&#8203;16814](astral-sh/uv#16814))
- Add support for relocatable nushell activation script ([#&#8203;17036](astral-sh/uv#17036))

##### Bug fixes

- Respect dropped (but explicit) indexes in dependency groups ([#&#8203;17012](astral-sh/uv#17012))

##### Documentation

- Improve `source-exclude` reference docs ([#&#8203;16832](astral-sh/uv#16832))
- Recommend `UV_NO_DEV` in Docker installs ([#&#8203;17030](astral-sh/uv#17030))
- Update `UV_VERSION` in docs for GitLab CI/CD ([#&#8203;17040](astral-sh/uv#17040))

### [`v0.9.16`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0916)

[Compare Source](astral-sh/uv@0.9.15...0.9.16)

Released on 2025-12-06.

##### Python

- Add CPython 3.14.2
- Add CPython 3.13.11

##### Enhancements

- Add a 5m default timeout to acquiring file locks to fail faster on deadlock ([#&#8203;16342](astral-sh/uv#16342))
- Add a stub `debug` subcommand to `uv pip` announcing its intentional absence ([#&#8203;16966](astral-sh/uv#16966))
- Add bounds in `uv add --script` ([#&#8203;16954](astral-sh/uv#16954))
- Add brew specific message for `uv self update` ([#&#8203;16838](astral-sh/uv#16838))
- Error when built wheel is for the wrong platform ([#&#8203;16074](astral-sh/uv#16074))
- Filter wheels from PEP 751 files based on `--no-binary` et al in `uv pip compile` ([#&#8203;16956](astral-sh/uv#16956))
- Support `--target` and `--prefix` in `uv pip list`, `uv pip freeze`, and `uv pip show` ([#&#8203;16955](astral-sh/uv#16955))
- Tweak language for build backend validation errors ([#&#8203;16720](astral-sh/uv#16720))
- Use explicit credentials cache instead of global static ([#&#8203;16768](astral-sh/uv#16768))
- Enable SIMD in HTML parsing ([#&#8203;17010](astral-sh/uv#17010))

##### Preview features

- Fix missing preview warning in `uv workspace metadata` ([#&#8203;16988](astral-sh/uv#16988))
- Add a `uv auth helper --protocol bazel` command ([#&#8203;16886](astral-sh/uv#16886))

##### Bug fixes

- Fix Pyston wheel compatibility tags ([#&#8203;16972](astral-sh/uv#16972))
- Allow redundant entries in `tool.uv.build-backend.module-name` but emit warnings ([#&#8203;16928](astral-sh/uv#16928))
- Fix infinite loop in non-attribute re-treats during HTML parsing ([#&#8203;17010](astral-sh/uv#17010))

##### Documentation

- Clarify `--project` flag help text to indicate project discovery ([#&#8203;16965](astral-sh/uv#16965))
- Regenerate the crates.io READMEs on release ([#&#8203;16992](astral-sh/uv#16992))
- Update Docker integration guide to prefer `COPY` over `ADD` for simple cases ([#&#8203;16883](astral-sh/uv#16883))
- Update PyTorch documentation to include information about supporting CUDA 13.0.x ([#&#8203;16957](astral-sh/uv#16957))
- Update the versioning policy ([#&#8203;16710](astral-sh/uv#16710))
- Upgrade PyTorch documentation to latest versions ([#&#8203;16970](astral-sh/uv#16970))

### [`v0.9.15`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0915)

[Compare Source](astral-sh/uv@0.9.14...0.9.15)

Released on 2025-12-02.

##### Python

- Add CPython 3.14.1
- Add CPython 3.13.10

##### Enhancements

- Add ROCm 6.4 to `--torch-backend=auto` ([#&#8203;16919](astral-sh/uv#16919))
- Add a Windows manifest to uv binaries ([#&#8203;16894](astral-sh/uv#16894))
- Add LFS toggle to Git sources ([#&#8203;16143](astral-sh/uv#16143))
- Cache source reads during resolution ([#&#8203;16888](astral-sh/uv#16888))
- Allow reading requirements from scripts without an extension ([#&#8203;16923](astral-sh/uv#16923))
- Allow reading requirements from scripts with HTTP(S) paths ([#&#8203;16891](astral-sh/uv#16891))

##### Configuration

- Add `UV_HIDE_BUILD_OUTPUT` to omit build logs ([#&#8203;16885](astral-sh/uv#16885))

##### Bug fixes

- Fix `uv-trampoline-builder` builds from crates.io by moving bundled executables ([#&#8203;16922](astral-sh/uv#16922))
- Respect `NO_COLOR` and always show the command as a header when paging `uv help` output ([#&#8203;16908](astral-sh/uv#16908))
- Use `0o666` permissions for flock files instead of `0o777` ([#&#8203;16845](astral-sh/uv#16845))
- Revert "Bump `astral-tl` to v0.7.10 ([#&#8203;16887](astral-sh/uv#16887))" to narrow down a regression causing hangs in metadata retrieval ([#&#8203;16938](astral-sh/uv#16938))

##### Documentation

- Link to the uv version in crates.io member READMEs ([#&#8203;16939](astral-sh/uv#16939))

### [`v0.9.14`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0914)

[Compare Source](astral-sh/uv@0.9.13...0.9.14)

Released on 2025-12-01.

##### Performance

- Bump `astral-tl` to v0.7.10 to enable SIMD for HTML parsing ([#&#8203;16887](astral-sh/uv#16887))

##### Bug fixes

- Allow earlier post releases with exclusive ordering ([#&#8203;16881](astral-sh/uv#16881))
- Prefer updating existing `.zshenv` over creating a new one in `tool update-shell` ([#&#8203;16866](astral-sh/uv#16866))
- Respect `-e` flags in `uv add` ([#&#8203;16882](astral-sh/uv#16882))

##### Enhancements

- Attach subcommand to User-Agent string ([#&#8203;16837](astral-sh/uv#16837))
- Prefer `UV_WORKING_DIR` over `UV_WORKING_DIRECTORY` for consistency ([#&#8203;16884](astral-sh/uv#16884))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi4yNy4xIiwidXBkYXRlZEluVmVyIjoiNDIuNDAuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants