mig-parted will fail to perform a GPU reset, and therefore toggle the MIG mode on GPUs where a reset is required,
if the nvidia_drm kernel module is loaded. Below is the error message one might encounter:
$ sudo nvidia-mig-parted apply -f /etc/nvidia-mig-manager/config-default.yaml -k /etc/nvidia-mig-manager/hooks-default.yaml -c all-balanced
ERRO[0003]
The following GPUs could not be reset:
GPU 00000000:01:00.0: In use by another client
GPU 00000000:47:00.0: In use by another client
GPU 00000000:81:00.0: In use by another client
GPU 00000000:C2:00.0: In use by another client
4 devices are currently being used by one or more other processes (e.g., Fabric Manager, CUDA application, graphics application such as an X server, or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using these devices and all compute applications running in the system.
FATA[0008] Error applying MIG configuration with hooks: error resetting all GPUs: exit status 255
A workaround is to unload the nvidia_drm kernel module before applying a MIG configuration and load it after the configuration change has been applied.
$ sudo rmmod nvidia_drm
$ sudo nvidia-mig-parted apply -f /etc/nvidia-mig-manager/config-default.yaml -k /etc/nvidia-mig-manager/hooks-default.yaml -c all-balanced
MIG configuration applied successfully
mig-partedwill fail to perform a GPU reset, and therefore toggle the MIG mode on GPUs where a reset is required,if the
nvidia_drmkernel module is loaded. Below is the error message one might encounter:A workaround is to unload the
nvidia_drmkernel module before applying a MIG configuration and load it after the configuration change has been applied.