centos7卸载cuda8.0,安装cuda9.0 cudnn7.3 tensorflow-gpu1.8.0

最近在看《21个项目玩转深度学习》,在使用Object Detection API训练自己的数据集时,出现了错误:

AttributeError: 'module' object has no attribute 'parallel_interleave'

需要安装tensorflow-gpu1.8.0,同时需要卸载cuda8.0,安装cuda9.0和cudnn7.3

系统配置:CentOS7.3系统64bit | python版本2.7.15

  • 卸载cuda8.0

cd /usr/local/cuda-8.0/bin
sudo ./uninstall_cuda_8.0.pl

还有cudnn的东西没有删除,也可以继续把cuda-8.0文件夹直接删除

  • 下载cuda9.0 cudnn7.3安装包

 去https://developer.nvidia.com/cuda-90-download-archive? target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=runfile local下载cuda9.0安装文件cuda_9.0.176_384.81_linux.run,依次选择“Linux”、“x86_64”、"CentOS"、“7”、“runfile(local)”,点击Base Installer的Download即可。

https://developer.nvidia.com/rdp/cudnn-archive下载cudnn7.3压缩包 cudnn-9.0-linux-x64-v7.3.0.29.tgz,下载cudnn需要注册帐号,选择"Download cuDNN  v7.3.0 [Sept 19.2018], for CUDA 9.0",点击"cuDNN v7.3.0 Library for Linux"即可下载,下载后解压缩得到cudnn-9.0-linux-x64-v7.3.0.29文件夹

将cuda_9.0.176_384.81_linux.run、cudnn-9.0-linux-x64-v7.3.0.29文件夹通过xftp上传到服务器。

  • 安装cuda9.0

sh cuda_9.0.176_384.81_linux.run

首先是说明文档,按q可以直接跳过。

按下面的步骤选择

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.0 ]: #直接按Enter

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /root ]: #直接按Enter

  • 配置环境变量

首先,查看cuda-9.0的bin目录下是否有nvcc:

cd /usr/local/cuda-9.0/bin

如果存在,直接将cuda路径加入系统路径即可:

vim ~/.bashrc进入配置文件;
添加以下两行:
export PATH="$PATH:/usr/local/cuda-9.0/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64"

然后更新配置文件:

source ~/.bashrc

再次执行nvcc --version就可以看到相应cuda版本了,如下:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
  • cudnn安装

将cudnn-9.0-linux-x64-v7.3.0.29/cuda文件夹中lib64和include的文件分别移动到/usr/local/cuda-9.0/lib64和 include文件夹中:

sudo cp include/cudnn.h /usr/local/cuda-9.0/include/
sudo cp -a lib64/libcudnn* /usr/local/cuda-9.0/lib64/
  • TensorFlow-gpu 1.8.0安装

https://pypi.org/project/tensorflow-gpu/1.8.0/#files下载tensorflow_gpu-1.8.0-cp27-cp27mu-manylinux1_x86_64.whl,并通过xftp上传到服务器,安装:

pip install tensorflow_gpu-1.8.0-cp27-cp27mu-manylinux1_x86_64.whl

可能会报错:Failed building wheel for grpcio

原因是没有安装grpcio,安装程序直接下载最新版1.31.0,并不适用python2.7,因此去https://pypi.org/project/grpcio/1.8.6/#files下载grpcio1.8.6的安装包grpcio-1.8.6-cp27-cp27mu-manylinux1_x86_64.whl,并通过xftp上传到服务器并安装:

pip install grpcio-1.8.6-cp27-cp27mu-manylinux1_x86_64.whl

重新安装tensorflow-gpu1.8.0,安装成功,但import时会报错:ImportError: /usr/local/cuda-9.0/lib64/libcudnn.so.7: file too short

解决办法:

先将/usr/local/cuda-9.0/lib64文件夹下的libcudnn.so.7  libcudnn.so.7.3.0删除;

再重新将下载的cudnn文件夹下的libcudnn.so.7.3.0复制到/usr/local/cuda-9.0/lib64文件夹下;

cp libcudnn.so.7.3.0 /usr/local/cuda-9.0/lib64/

切换到/usr/local/cuda-9.0/lib64/目录下,运行:

ln -s libcudnn.so.7.3.0 libcudnn.so.7

就可以import tensorflow了。

 

参考资料:

https://blog.csdn.net/pursuit_zhangyu/article/details/80232550

https://blog.csdn.net/DeepCBW/article/details/104414520

https://blog.csdn.net/qq_29981283/article/details/83994797

https://blog.csdn.net/Zqinstarking/article/details/80713338

https://blog.csdn.net/Flying_sfeng/article/details/103343813

https://blog.csdn.net/qq_20373723/article/details/86137271

 


版权声明:本文为Alpha_P原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。