本文主要是介绍【nvidia-smi报错】Failed to initialize NVML: Driver/library version mismatch,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
问题:
使用nvidia-smi命令查看显卡状态时,出现错误:
Failed to initialize NVML: Driver/library version mismatch
而使用nvcc -V查看cuda版本时,显示正常
分析解决:
从现象看是cuda正常,但与之匹配的显卡驱动版本变了,导致出现不匹配问题。
个人简单粗暴的做法是重新下载当前cuda版本的安装包,只安装驱动不安装cuda恢复正常。例如,我是cuda-12.0,下载安装:
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
sudo sh cuda_12.0.0_525.60.13_linux.run
安装后执行nvidia-smi,正常显示显卡状态。
但是安装驱动可能出错,例如:
ERROR: An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be us ing the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your k ernel supports module unloading, and you still receive this message, then an error may have occurred that has corrup ted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
查看内核模块
lsmod | grep nvidia nvidia_uvm
995356 2 nvidia_drm 53134 0 nvidia_modeset
1195268 1 nvidia_drm nvidia
35237551 14 nvidia_modeset,nvidia_uvm drm_kms_helper
179394 2 i915,nvidia_drm drm
429744 5 i915,drm_kms_helper,nvidia,nvidia_drm
查看相应进程并结束
lsof /dev/nvidia*
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sbatchd 3680 root 5u CHR 195,255 0t0 56434 /dev/nvidiactl
sbatchd 3680 root 6u CHR 237,0 0t0 52212 /dev/nvidia-uvm
sbatchd 3680 root 7u CHR 195,0 0t0 54226 /dev/nvidia0
sbatchd 3680 root 8u CHR 195,0 0t0 54226 /dev/nvidia0
sbatchd 3680 root 9u CHR 195,0 0t0 54226 /dev/nvidia0
kill -9 3680
卸载相应模块,重新安装
sudo sh cuda_12.0.0_525.60.13_linux.run
这篇关于【nvidia-smi报错】Failed to initialize NVML: Driver/library version mismatch的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!