Skip to content

std::fs::copy fails on NFS volumes on CentOS 7 #75387

Closed
@Gaelan

Description

@Gaelan

I tried this code:

// On a CentOS 7 VM, where the current directory is an NFS mount containing a file called "a" with any content
use std::fs;

fn main() {
  println!("Hello, world!");
  fs::copy("a", "b").unwrap();
}

I expected to see this happen: The file is successfully copied

Instead, this happened:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 95, kind: Other, message: "Operation not Supported" }', src/main.rs:4:4

Meta

rustc --version --verbose:

Screen Shot 2020-08-11 at 12 08 34 AM

Backtrace

Screen Shot 2020-08-11 at 12 10 16 AM Screen Shot 2020-08-11 at 12 10 56 AM Screen Shot 2020-08-11 at 12 11 59 AM

Apologies for screenshots, I'm running in a VM I can't easily copy/paste out of.

See also rust-lang/rustup#2452, which has a likely explanation of the cause of this.

Activity

Tavi-Kohn

Tavi-Kohn commented on Aug 11, 2020

@Tavi-Kohn

This bug doesn't occur every time, because I think fs::copy remembers if the copy_file_range system call is available. If fs::copy is called with two files on the same XFS filesystem, it determines that the copy_file_range system call failed. It then sets a flag to always fall back to a more generic copy method, which prevents the bug from occurring for subsequent calls.

Code Example
use std::fs::copy;
const NFS_IN: &'static str = "/some/nfs/mount/in";
const NFS_OUT: &'static str = "/some/nfs/mount/out";

const XFS_IN: &'static str = "/some/xfs/mount/in";
const XFS_OUT: &'static str = "/some/xfs/mount/out";
fn main() {
    println!("{:?}", copy(NFS_IN, NFS_OUT)); // Err(Os { code: 95, kind: Other, message: "Operation not supported" })
    println!("{:?}", copy(XFS_IN, XFS_OUT)); // Ok(1)
    println!("{:?}", copy(NFS_IN, NFS_OUT)); // Ok(1)
}
added
T-libsRelevant to the library team, which will review and decide on the PR/issue.
on Aug 11, 2020
the8472

the8472 commented on Aug 11, 2020

@the8472
Member

This might be a kernel bug or documentation error because EOPNOTSUPP is not listed in the copy_file_range man page as possible error and the kernel even has a warning that this stuff shouldn't happen along with a commit comment that it's the responsibility of the filesystem to perform the fallback.
And indeed NFS does have fallback code

So that's probably fixed in a newer kernel version, but who knows with redhat's frankenkernels.

the8472

the8472 commented on Aug 11, 2020

@the8472
Member

If a kernel update doesn't fix it you could also report this to the distro maintainers too, they might have missed something when backporting patches. I haven't looked at the centos kernel sources though, so that's just a guess.

Mark-Simulacrum

Mark-Simulacrum commented on Aug 11, 2020

@Mark-Simulacrum
Member

Cc @cuviper @joshtriplett, though not sure if you are the right people to ask about the possible kernel issue mentioned above.

Regardless we will likely need to handle this ourselves as kernel or distro updates will likely be slow.

the8472

the8472 commented on Aug 11, 2020

@the8472
Member

Ok, should be easy enough. @rustbot claim

the8472

the8472 commented on Aug 11, 2020

@the8472
Member

This bug doesn't occur every time, because I think fs::copy remembers if the copy_file_range system call is available.

It only remembers that if it encounters specific error codes (ENOSYS or EPERM but not EOPNOTSUPP). So it should also try copy_file_range on the second attempt and encounter the same error.
Could this be related to automounting?

Can you trace the syscalls of your test program via strace -ff [...] and post the output?

cuviper

cuviper commented on Aug 11, 2020

@cuviper
Member

I know that RHEL 7.8 disabled copy_file_range -- see the last note in the 7.8 release notes, section 9.4:

The copy_file_range() call has been disabled on local file systems and in NFS

The copy_file_range() system call on local file systems contains multiple issues that are difficult to fix. To avoid file corruptions, copy_file_range() support on local file systems has been disabled in RHEL 7.8. If an application uses the call in this case, copy_file_range() now returns an ENOSYS error.

For the same reason, the server-side-copy feature has been disabled in the NFS server. However, the NFS client still supports copy_file_range() when accessing a server that supports server-side-copy.

However, I think an EOPNOTSUPP from NFS accidentally leaked through, and will be changed to ENOSYS too:
https://bugzilla.redhat.com/show_bug.cgi?id=1783554

the8472

the8472 commented on Aug 11, 2020

@the8472
Member

Oh that's a mess, so it returns ENOSYS in most cases but in some cases copy_file_range would still succeed? The detection logic behaves overly pessimistic then, but I guess that's ok if it was broken in older centos versions.
I'll treat EOPNOTSUPP like ENOSYS then.

joshtriplett

joshtriplett commented on Aug 12, 2020

@joshtriplett
Member

Yes, treating EOPNOTSUPP as ENOSYS here seems like an appropriate workaround for the kernel bug.

Tavi-Kohn

Tavi-Kohn commented on Aug 12, 2020

@Tavi-Kohn

I've run strace on the test program I wrote earlier.
For two files on an XFS filesystem:

copy_file_range(3, NULL, 4, NULL, 1, 0) = -1 ENOSYS (Function not implemented)

On an NFS filesystem:

copy_file_range(3, NULL, 4, NULL, 1, 0) = -1 EOPNOTSUPP (Operation not supported)

8 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

C-bugCategory: This is a bug.T-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @cuviper@joshtriplett@the8472@jonas-schievink@Gaelan

    Issue actions

      std::fs::copy fails on NFS volumes on CentOS 7 · Issue #75387 · rust-lang/rust