Description
PR #126 added an extra step to run xfs_repair
before mounting a XFS file system. However instead of helping to automatically correct FS issues due to prior unclean shutdowns, it actually prevented auto recovery from happening, which led to complete unavailability of the corresponding volume and subsequently required manual human intervention.
The sequence of events is as follows:
- A node loss / unclean shutdown occurs.
- A stateful pod is restarted on another healthy node; its volume is re-attached to the new node.
- xfs_repair is run against the volume. The relevant logs would look like these (in the context of rook-ceph but should apply to any other user of the mounter)
Filesystem corruption was detected for /dev/rbd1, running xfs_repair to repair
ID: 29 Req-ID: 0001-0009-rook-ceph-0000000000000001-9adb43bf-4e25-11ea-aa19-2ecc193be507 failed to mount device path (/dev/rbd1) to staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-423b5a86-7c03-43c4-a7e9-4921934016de/globalmount/0001-0009-rook-ceph-0000000000000001-9adb43bf-4e25-11ea-aa19-2ecc193be507) for volume (0001-0009-rook-ceph-0000000000000001-9adb43bf-4e25-11ea-aa19-2ecc193be507) error 'xfs_repair' found errors on device /dev/rbd1 but could not correct them: Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
- The volume is then prevented from being mounted and manual intervention is required
- All that's needed then, is to manually mount the volume, which replay XFS logs automatically, then unmount it and restart the corresponding pod.
Note that step #5 was what has always happened prior to this change. The volume is simply mounted without any attempt to perform FS check / xfs_repair. It can simply correct itself as part of just being mounted, as per XFS design.
The recommended fix is to only attempt to run xfs_repair
if mounting actually fails, as the last resort. There shouldn't be any need to xfs_repair
prior to a mount failure.
Alternatively, don't bail out if an error occurs when running xfs_repair
. Let the mount attempt happen anyway. It'll then either fix itself, or fail mounting with another error.
Relevant issue from rook-ceph repo: rook/rook#4914
CC'ing @27149chen