最近在看《21个项目玩转深度学习》,在使用Object Detection API训练自己的数据集时,出现了错误:
AttributeError: 'module' object has no attribute 'parallel_interleave'
需要安装tensorflow-gpu1.8.0,同时需要卸载cuda8.0,安装cuda9.0和cudnn7.3
系统配置:CentOS7.3系统64bit | python版本2.7.15
卸载cuda8.0
cd /usr/local/cuda-8.0/bin
sudo ./uninstall_cuda_8.0.pl
还有cudnn的东西没有删除,也可以继续把cuda-8.0文件夹直接删除
下载cuda9.0 cudnn7.3安装包
去https://developer.nvidia.com/cuda-90-download-archive? target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=runfile local下载cuda9.0安装文件cuda_9.0.176_384.81_linux.run,依次选择“Linux”、“x86_64”、"CentOS"、“7”、“runfile(local)”,点击Base Installer的Download即可。
去https://developer.nvidia.com/rdp/cudnn-archive下载cudnn7.3压缩包 cudnn-9.0-linux-x64-v7.3.0.29.tgz,下载cudnn需要注册帐号,选择"Download cuDNN v7.3.0 [Sept 19.2018], for CUDA 9.0",点击"cuDNN v7.3.0 Library for Linux"即可下载,下载后解压缩得到cudnn-9.0-linux-x64-v7.3.0.29文件夹
将cuda_9.0.176_384.81_linux.run、cudnn-9.0-linux-x64-v7.3.0.29文件夹通过xftp上传到服务器。
安装cuda9.0
sh cuda_9.0.176_384.81_linux.run
首先是说明文档,按q可以直接跳过。
按下面的步骤选择
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n
Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]: #直接按Enter
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /root ]: #直接按Enter
配置环境变量
首先,查看cuda-9.0的bin目录下是否有nvcc:
cd /usr/local/cuda-9.0/bin
如果存在,直接将cuda路径加入系统路径即可:
vim ~/.bashrc进入配置文件;
添加以下两行:
export PATH="$PATH:/usr/local/cuda-9.0/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64"
然后更新配置文件:
source ~/.bashrc
再次执行nvcc --version就可以看到相应cuda版本了,如下:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
cudnn安装
将cudnn-9.0-linux-x64-v7.3.0.29/cuda文件夹中lib64和include的文件分别移动到/usr/local/cuda-9.0/lib64和 include文件夹中:
sudo cp include/cudnn.h /usr/local/cuda-9.0/include/
sudo cp -a lib64/libcudnn* /usr/local/cuda-9.0/lib64/
TensorFlow-gpu 1.8.0安装
去https://pypi.org/project/tensorflow-gpu/1.8.0/#files下载tensorflow_gpu-1.8.0-cp27-cp27mu-manylinux1_x86_64.whl,并通过xftp上传到服务器,安装:
pip install tensorflow_gpu-1.8.0-cp27-cp27mu-manylinux1_x86_64.whl
可能会报错:Failed building wheel for grpcio
原因是没有安装grpcio,安装程序直接下载最新版1.31.0,并不适用python2.7,因此去https://pypi.org/project/grpcio/1.8.6/#files下载grpcio1.8.6的安装包grpcio-1.8.6-cp27-cp27mu-manylinux1_x86_64.whl,并通过xftp上传到服务器并安装:
pip install grpcio-1.8.6-cp27-cp27mu-manylinux1_x86_64.whl
重新安装tensorflow-gpu1.8.0,安装成功,但import时会报错:ImportError: /usr/local/cuda-9.0/lib64/libcudnn.so.7: file too short
解决办法:
先将/usr/local/cuda-9.0/lib64文件夹下的libcudnn.so.7 libcudnn.so.7.3.0删除;
再重新将下载的cudnn文件夹下的libcudnn.so.7.3.0复制到/usr/local/cuda-9.0/lib64文件夹下;
cp libcudnn.so.7.3.0 /usr/local/cuda-9.0/lib64/
切换到/usr/local/cuda-9.0/lib64/目录下,运行:
ln -s libcudnn.so.7.3.0 libcudnn.so.7
就可以import tensorflow了。
参考资料:
https://blog.csdn.net/pursuit_zhangyu/article/details/80232550
https://blog.csdn.net/DeepCBW/article/details/104414520
https://blog.csdn.net/qq_29981283/article/details/83994797
https://blog.csdn.net/Zqinstarking/article/details/80713338
https://blog.csdn.net/Flying_sfeng/article/details/103343813
https://blog.csdn.net/qq_20373723/article/details/86137271