Skip to content

Conversation

@e-nomem
Copy link
Contributor

@e-nomem e-nomem commented Dec 9, 2025

Summary

This PR explicitly sets the entry type for files in an sdist. This changes the entry type from AREGTYPE (the 'legacy' regular file type) to REGTYPE (the 'normal' regular file type) in the generated tar.

This change works around a bug in the python tarfile module that causes all entries after a certain point in the tar to be silently ignored if any entry matches some very specific conditions. In maturin this was very visible since the PKG-INFO was written at the very end so twine check would loudly complain that the PKG-INFO was missing and that the sdist was invalid. In uv the PKG-INFO is written at the beginning so this issue is unlikely to be caught.

Note that this change does mean that sdists created with newer versions of the uv build backend will not be byte-for-byte identical with sdists from an older version.

See PyO3/maturin#2855 (comment)

Test Plan

This is the same as the change that was made in maturin to work around the same issue

@woodruffw
Copy link
Member

woodruffw commented Dec 9, 2025

This change works around a bug in the python tarfile module that causes all entries after a certain point in the tar to be silently ignored if any entry matches some very specific conditions.

Out of curiosity, did you file a bug with CPython for this? I suspect it's the kind of thing they'd be interested in fixing in a patch release for each non-EOL version.

Edit: my bad, I see you filed it as python/cpython#141707! Linking here so these get cross-referenced 🙂

@zanieb zanieb requested a review from konstin December 9, 2025 17:15
@e-nomem
Copy link
Contributor Author

e-nomem commented Dec 9, 2025

There's also a related issue in tar-rs to initialize headers by default to REGTYPE instead of AREGTYPE because I suspect that most users are just not explicitly setting this flag rather than wanting to actually use the legacy type. It's not likely to get merged anytime soon though because it's a breaking change. alexcrichton/tar-rs#422

@konstin
Copy link
Member

konstin commented Dec 10, 2025

Do you have a reproducible example (ideally a standalone rust script) and a Python script that both use and CPython can use to check the problem and its fix?

@e-nomem
Copy link
Contributor Author

e-nomem commented Dec 10, 2025

Here's a rust script that will create two tar files in the local directory:

#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
tar = "0.4.44"
---
use std::fs::File;
use std::io::Error as IoError;
use std::path::Path;

use tar::Builder;
use tar::EntryType;
use tar::Header;

fn create_tar(filename: impl AsRef<Path>, entry_type: Option<EntryType>) -> Result<(), IoError> {
	let file = File::options().write(true).create(true).truncate(true).open(filename)?;
	let mut tarfile = Builder::new(file);

	let entries = [
		"file1".into(),
		format!("{}/bbb", "a".repeat(99)),
		"file2".into(),
	];

	for entry in entries {
		let mut header = Header::new_gnu();
		header.set_mode(0o644);
		header.set_size(entry.len() as u64);
		if let Some(entry_type) = entry_type {
			header.set_entry_type(entry_type);
		}
		tarfile.append_data(&mut header, &entry, entry.as_bytes())?;
	}

	tarfile.finish()?;
	Ok(())
}

fn main() -> Result<(), IoError> {
	create_tar("good.tar", Some(EntryType::Regular))?;
	create_tar("bad.tar", None)?;
	Ok(())
}

And here's a python script to test the two tar files:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.9"
# dependencies = [
#   "pytest",
# ]
# ///
import tarfile
from io import BytesIO

import pytest


def get_members(filename):
	with tarfile.open(filename, 'r') as tar:
		return tar.getmembers()


def test_good_tar():
	assert len(get_members("good.tar")) == 3


@pytest.mark.xfail(strict=True)
def test_bad_tar():
	assert len(get_members("bad.tar")) == 3


@pytest.mark.parametrize('format', (
	pytest.param(tarfile.GNU_FORMAT, id="gnu"),
	pytest.param(tarfile.PAX_FORMAT, id="pax"),
))
@pytest.mark.xfail(strict=True)
def test_python_only(format):
	fp = BytesIO()
	with tarfile.open(mode='w', fileobj=fp, format=format) as tar:
		info = tarfile.TarInfo()
		info.type = tarfile.AREGTYPE
		info.name = ("a" * 99) + "/bbb"
		tar.addfile(info)

		expected = {t.name: t.type for t in tar.getmembers()}
	
	fp.seek(0)
	with tarfile.open(mode='r', fileobj=fp) as tar:
		actual = {t.name: t.type for t in tar.getmembers()}
	
	assert expected == actual


if __name__ == "__main__":
	pytest.main([__file__])

Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@konstin
Copy link
Member

konstin commented Dec 11, 2025

I've added a comment explaining why we need to set that type.

@konstin konstin enabled auto-merge (squash) December 11, 2025 10:24
@konstin konstin merged commit 3bb7f67 into astral-sh:main Dec 11, 2025
101 checks passed
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Dec 18, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.9.17` -> `0.9.18` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.9.18`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0918)

[Compare Source](astral-sh/uv@0.9.17...0.9.18)

Released on 2025-12-16.

##### Enhancements

- Add value hints to command line arguments to improve shell completion accuracy ([#&#8203;17080](astral-sh/uv#17080))
- Improve error handling in `uv publish` ([#&#8203;17096](astral-sh/uv#17096))
- Improve rendering of multiline error messages ([#&#8203;17132](astral-sh/uv#17132))
- Support redirects in `uv publish` ([#&#8203;17130](astral-sh/uv#17130))
- Include Docker images with the alpine version, e.g., `python3.x-alpine3.23` ([#&#8203;17100](astral-sh/uv#17100))

##### Configuration

- Accept `--torch-backend` in `[tool.uv]` ([#&#8203;17116](astral-sh/uv#17116))

##### Performance

- Speed up `uv cache size` ([#&#8203;17015](astral-sh/uv#17015))
- Initialize S3 signer once ([#&#8203;17092](astral-sh/uv#17092))

##### Bug fixes

- Avoid panics due to reads on failed requests ([#&#8203;17098](astral-sh/uv#17098))
- Enforce latest-version in `@latest` requests ([#&#8203;17114](astral-sh/uv#17114))
- Explicitly set `EntryType` for file entries in tar ([#&#8203;17043](astral-sh/uv#17043))
- Ignore `pyproject.toml` index username in lockfile comparison ([#&#8203;16995](astral-sh/uv#16995))
- Relax error when using `uv add` with `UV_GIT_LFS` set ([#&#8203;17127](astral-sh/uv#17127))
- Support file locks on ExFAT on macOS ([#&#8203;17115](astral-sh/uv#17115))
- Change schema for `exclude-newer` into optional string ([#&#8203;17121](astral-sh/uv#17121))

##### Documentation

- Drop arm musl caveat from Docker documentation ([#&#8203;17111](astral-sh/uv#17111))
- Fix version reference in resolver example ([#&#8203;17085](astral-sh/uv#17085))
- Better documentation for `exclude-newer*` ([#&#8203;17079](astral-sh/uv#17079))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi41Ny4xIiwidXBkYXRlZEluVmVyIjoiNDIuNTcuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants