Skip to content

Specialize std::io::copy for files, sockets, or pipes on Linux #74426

Closed
@ghost

Description

Currently std::fs::copy is using std::io::copy to copy files - it reads the file content into the memory, and then write it to the output file. std::fs::copy or (edit: it tries copy_file_range(2) first and has already benefited from the file cloning feature) std::io::copy can benefit from the file cloning feature in Btrfs and XFS by using FICLONE (and fallbacking to read-write loop if the filesystem does not support it).
I think is easy to use FICLONE in std::fs::copy, but it's less general (it requires file paths). std::io::copy is more general, but some additional methods may be needed to be added to std::io::Read and std::io::Write in order to use FICLONE.
libuv's support for FICLONE: libuv/libuv#1491
The ioctl_ficlonerange(2) man page: https://www.man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html

Activity

added
C-feature-requestCategory: A feature request, i.e: not implemented / a PR.
T-libsRelevant to the library team, which will review and decide on the PR/issue.
O-linuxOperating system: Linux
on Jul 17, 2020
cuviper

cuviper commented on Jul 17, 2020

@cuviper
Member

The Linux fs::copy at least tries to use copy_file_range to avoid the trip through userspace memory, but it will still end up with duplicate data on the device.

GNU coreutils also tries to use FICLONE for cp --reflink, but that's an explicit opt-in.
https://github.com/coreutils/coreutils/blob/6a3d2883fed853ee01079477020091068074e12d/src/copy.c#L408

the8472

the8472 commented on Jul 17, 2020

@the8472
Member

The Linux fs::copy at least tries to use copy_file_range to avoid the trip through userspace memory, but it will still end up with duplicate data on the device.

My understanding is that copy_file_range already uses reflink copies if supported. So all you need is a sufficiently recent kernel.
https://github.com/torvalds/linux/blob/b1b11d0063aab28aaf65f63cff56470bc01dc290/fs/read_write.c#L1728-L1744

std::fs::copy or std::io::copy can benefit from the file cloning feature in Btrfs and XFS by using FICLONE

io::copy would require specialization to make this work and this would be an unreliable optimization since wrapper types such as BufWriter could obscur the underlying files from the specialization. As mentioned above fs::copy should already be using reflinks.

cuviper

cuviper commented on Jul 17, 2020

@cuviper
Member

My understanding is that copy_file_range already uses reflink copies if supported.

Thanks, I didn't realize it did that!

ghost

ghost commented on Jul 18, 2020

@ghost

My understanding is that copy_file_range already uses reflink copies if supported.

Thanks, I didn't realize it did that!

Oops, I didn't realize that either. The copy_file_range(2) man page says nothing about it.

std::fs::copy or std::io::copy can benefit from the file cloning feature in Btrfs and XFS by using FICLONE

io::copy would require specialization to make this work and this would be an unreliable optimization since wrapper types such as BufWriter could obscur the underlying files from the specialization. As mentioned above fs::copy should already be using reflinks.

Then, if you think there is nothing to do, feel free to close this issue.

the8472

the8472 commented on Jul 18, 2020

@the8472
Member

Specializing io::copy may still be useful for generic code operating on Read/Write traits or already opened files.

added
T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.
on Jul 20, 2020
joshtriplett

joshtriplett commented on Jul 20, 2020

@joshtriplett
Member

Tagging T-libs, as it seems like a libs team question whether it would be valid to change the underlying semantics of these copy operations.

It sounds like std::fs::copy is already doing the right thing. std::io::copy might be able to use more specialized operations in various cases, such as splice or FICLONE, or at the very least, sendfile. (Or, long-term, io_uring.)

ghost

ghost commented on Jul 20, 2020

@ghost

Specializing std::io::copy is also useful to clone only a part of a file to only a part of another file - for example, when creating a tar archive.
Also, #60689 may get resolved if this is resolved.

ghost changed the title [-]Use FICLONE for std::fs::copy or std::io::copy on Linux[/-] [+]Specialize std::io::copy for files, sockets, or pipes on Linux[/+] on Jul 20, 2020
the8472

the8472 commented on Jul 22, 2020

@the8472
Member

Since this would require specializing on the input and output type the naive approach would lead to quadratic blowup of types. So it's probably better to have an internal highly parameterized copy helper and two separate sets of specializations that each prepare one half of the parameter set.
That would allow extracting the file descriptors and some ancillary things from a large set of concrete wrapper types, i.e. it could also cover #49921 while avoiding the concerns in #71091 at the cost of having to spell out each specialized type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.O-linuxOperating system: LinuxT-libsRelevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @cuviper@joshtriplett@the8472@jonas-schievink

      Issue actions

        Specialize std::io::copy for files, sockets, or pipes on Linux · Issue #74426 · rust-lang/rust