kearas linux安装_linux服务器上配置进行kaggle比赛的深度学习tensorflow keras环境详细教程...

full guide tutorial to install and configure deep learning environments on linux server

Quick Guide

prepare

tools

MobaXterm (for windows)

ssh + vscode

for windows:

drop files to MobaXterm to upload to server

use zip format

commands

view disk

du -d 1 -h

df -h

gpu and cpu usage

watch -n 1 nvidia-smi

top

view files and count

wc -l data.csv

# count how many folders

ls -lR | grep '^d' | wc -l

17

# count how many jpg files

ls -lR | grep '.jpg' | wc -l

1360

# view 10 images

ls train | head

ls test | head

link datasets

# link

ln -s srt dest

ln -s /data_1/kezunlin/datasets/ dl4cv/datasets

scp

scp -r node17:~/dl4cv ~/git/

scp -r node17:~/.keras ~/

tmux for background tasks

tmux new -s notebook

tmux ls

tmux attach -t notebook

tmux detach

wget download

# wget

# continue donwload

wget -c url

# background donwload for large file

wget -b -c url

tail -f wget-log

# kill background wget

pkill -9 wget

tips about training large model

terminal 1:

tmux new -s train

conda activate keras

time python train_alexnet.py

terminal 2:

tmux detach

tmux attach -t train

and then close vscode, otherwise bash training process will exit when we close vscode.

cuda driver and toolkits

cudatookit version depends on cuda driver version.

install nvidia-drivers

sudo add-apt-repository ppa:graphics-drivers/ppa

sudp apt-get update

sudo apt-cache search nvidia-*

# nvidia-384

# nvidia-396

sudo apt-get -y install nvidia-418

# test

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

install cuda-toolkit(dirvers)

remove all previous nvidia drivers

sudo apt-get -y pruge nvidia-*

go to here and download cuda_10.1

wget -b -c http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run

sudo sh cuda_10.1.243_418.87.00_linux.run

sudo ./cuda_10.1.243_418.87.00_linux.run

vim .bashrc

# for cuda and cudnn

export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

check cuda driver version

> cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.87.00 Thu Aug 8 15:35:46 CDT 2019

GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

>nvidia-smi

Tue Aug 27 17:36:35 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |

|-------------------------------+----------------------+----------------------+

> nvidia-smi -L

GPU 0: Quadro RTX 8000 (UUID: GPU-acb01c1b-776d-cafb-ea35-430b3580d123)

GPU 1: Quadro RTX 8000 (UUID: GPU-df7f0fb8-1541-c9ce-e0f8-e92bccabf0ef)

GPU 2: Quadro RTX 8000 (UUID: GPU-67024023-20fd-a522-dcda-261063332731)

GPU 3: Quadro RTX 8000 (UUID: GPU-7f9d6a27-01ec-4ae5-0370-f0c356327913)

> nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2019 NVIDIA Corporation

Built on Sun_Jul_28_19:07:16_PDT_2019

Cuda compilation tools, release 10.1, V10.1.243

install conda

./Anaconda3-2019.03-Linux-x86_64.sh

[yes]

[yes]

config channels

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

conda config --set show_channel_urls yes

install libraries

conclusions:

py37/keras: conda install -y tensorflow-gpu keras==2.2.5

py37/torch: conda install -y pytorch torchvision

py36/mxnet: conda install -y mxnet

keras 2.2.5 was released on 2019/8/23.

Add new Applications: ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2.

common libraries

conda install -y scikit-learn scikit-image pandas matplotlib pillow opencv seaborn

pip install imutils progressbar pydot pylint

pip install imutils to avoid downgrade for tensorflow-gpu

py37

cudatoolkit 10.0.130 0

cudnn 7.6.0 cuda10.0_0

tensorflow-gpu 1.13.1

py36

cudatoolkit anaconda/pkgs/main/linux-64::cudatoolkit-10.1.168-0

cudnn anaconda/pkgs/main/linux-64::cudnn-7.6.0-cuda10.1_0

tensorboard anaconda/pkgs/main/linux-64::tensorboard-1.14.0-py36hf484d3e_0

tensorflow anaconda/pkgs/main/linux-64::tensorflow-1.14.0-gpu_py36h3fb9ad6_0

tensorflow-base anaconda/pkgs/main/linux-64::tensorflow-base-1.14.0-gpu_py36he45bfe2_0

tensorflow-estima~ anaconda/cloud/conda-forge/linux-64::tensorflow-estimator-1.14.0-py36h5ca1d4c_0

tensorflow-gpu anaconda/pkgs/main/linux-64::tensorflow-gpu-1.14.0-h0d30ee6_0

imutils only support 36 and 37.

mxnet only support 35 and 36.

details

# remove py35

conda remove -n py35 --all

conda info --envs

conda create -n py37 python==3.7

conda activate py37

# common libraries

conda install -y scikit-learn pandas pillow opencv

pip install imutils

# imutils

conda search imutils

# py36 and py37

# Name Version Build Channel

imutils 0.5.2 py27_0 anaconda/cloud/conda-forge

imutils 0.5.2 py36_0 anaconda/cloud/conda-forge

imutils 0.5.2 py37_0 anaconda/cloud/conda-forge

# tensorflow-gpu and keras

conda install -y tensorflow-gpu keras

# install pytorch

conda install -y pytorch torchvision

# install mxnet

# method 1: pip

pip search mxnet

mxnet-cu80[mkl]/mxnet-cu90[mkl]/mxnet-cu91[mkl]/mxnet-cu92[mkl]/mxnet-cu100[mkl]/mxnet-cu101[mkl]

# method 2: conda

conda install mxnet

# py35 and py36

TensorFlow Object Detection API

home page: home page

download tensorflow models and rename models-master to tfmodels

vim ~/.bashrc

export PYTHONPATH=/home/kezunlin/dl4cv:/data_1/kezunlin/tfmodels/research:$PYTHONPATH

source ~/.bashrc

jupyter notebook

conda activate py37

conda install -y jupyter

install kernels

python -m ipykernel install --user --name=py37

Installed kernelspec py37 in /home/kezunlin/.local/share/jupyter/kernels/py37

config for server

python -c "import IPython;print(IPython.lib.passwd())"

Enter password:

Verify password:

sha1:ef2fb2aacff2:4ea2998699638e58d10d594664bd87f9c3381c04

jupyter notebook --generate-config

Writing default config to: /home/kezunlin/.jupyter/jupyter_notebook_config.py

vim .jupyter/jupyter_notebook_config.py

c.NotebookApp.ip = '*'

c.NotebookApp.password = u'sha1:xxx:xxx'

c.NotebookApp.open_browser = False

c.NotebookApp.port = 8888

c.NotebookApp.enable_mathjax = True

run jupyter on background

tmux new -s notebook

jupyter notebook

# ctlr+b+d exit session and DO NOT close session

# ctlr+d exit session and close session

access web and input password

test

py37

import cv2

cv2.__version

import tensorflow as tf

import keras

import torch

import torchvision

cat .keras/keras.json

{

"epsilon": 1e-07,

"floatx": "float32",

"backend": "tensorflow",

"image_data_format": "channels_last"

}

py36

import mxnet

train demo

export

# use CPU only

export CUDA_VISIBLE_DEVICES=""

# use gpu 0 1

export CUDA_VISIBLE_DEVICES="0,1"

code

import os

os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"

start train

python train.py

./keras folder

view keras models and datasets

ls .keras/

datasets keras.json models

models saved to /home/kezunlin/.keras/models/

datasets saved to /home/kezunlin/.keras/datasets/

models lists

xxx_kernels_notop.h5 for include_top = False

xxx_kernels.h5 for include_top = True

Datasets

mnist

cifar10

to skip download

wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

mv ~/Download/cifar-10-python.tar.gz ~/.keras/datasets/cifar-10-batches-py.tar.gz

to load data

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

flowers-17

animals

panda images are WRONG !!!

counts

ls -lR animals/cat | grep ".jpg" | wc -l

1000

ls -lR animals/dog | grep ".jpg" | wc -l

1000

ls -lR animals/panda | grep ".jpg" | wc -l

1000

kaggle cats vs dogs

caltech101

download background

wget -b -c http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz

Kaggle API

install and config

conda activate keras

conda install kaggle

# download kaggle.json

mv kaggle.json ~/.kaggle/kaggle.json

chmod 600 ~/.kaggle/kaggle.json

cat kaggle.json

{"username":"xxx","key":"yyy"}

or by export

export KAGGLE_USERNAME=xxx

export KAGGLE_KEY=yyy

tips

go to account and select 'Create API Token' and keras.json will be downloaded.

Ensure kaggle.json is in the location ~/.kaggle/kaggle.jsonto use the API.

check version

kaggle --version

Kaggle API 1.5.5

commands overview

commands

kaggle competitions {list, files, download, submit, submissions, leaderboard}

kaggle datasets {list, files, download, create, version, init}

kaggle kernels {list, init, push, pull, output, status}

kaggle config {view, set, unset}

download datasets

kaggle competitions download -c dogs-vs-cats

show leaderboard

kaggle competitions leaderboard dogs-vs-cats --show

teamId teamName submissionDate score

------ --------------------------------- ------------------- -------

71046 Pierre Sermanet 2014-02-01 21:43:19 0.98533

66623 Maxim Milakov 2014-02-01 18:20:58 0.98293

72059 Owen 2014-02-01 17:04:40 0.97973

74563 Paul Covington 2014-02-01 23:05:20 0.97946

74298 we've been in KAIST 2014-02-01 21:15:30 0.97840

71949 orchid 2014-02-01 23:52:30 0.97733

set default competition

kaggle config set --name competition --value dogs-vs-cats

- competition is now set to: dogs-vs-cats

kaggle config set --name competition --value dogs-vs-cats-redux-kernels-edition

dogs-vs-cats

dogs-vs-cats-redux-kernels-edition

submit

kaggle c submissions

- Using competition: dogs-vs-cats

- No submissions found

kaggle c submit -f ./submission.csv -m "first submit"

competition has already ended, so can not submit.

Nvidia-docker and containers

install

sudo apt-get -y install docker

# Install nvidia-docker2 and reload the Docker daemon configuration

sudo apt-get install -y nvidia-docker2

sudo pkill -SIGHUP dockerd

restart (optional)

cat /etc/docker/daemon.json

{

"runtimes": {

"nvidia": {

"path": "nvidia-container-runtime",

"runtimeArgs": []

}

}

}

sudo systemctl enable docker

sudo systemctl start docker

if errors occur:

Job for docker.service failed because the control process exited with error code.

See "systemctl status docker.service" and "journalctl -xe" for details.

check /etc/docker/daemon.json

test

sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi

sudo nvidia-docker run --rm nvidia/cuda:10.1-base nvidia-smi

Thu Aug 29 00:11:32 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Quadro RTX 8000 Off | 00000000:02:00.0 Off | Off |

| 43% 67C P2 136W / 260W | 46629MiB / 48571MiB | 17% Default |

+-------------------------------+----------------------+----------------------+

| 1 Quadro RTX 8000 Off | 00000000:03:00.0 Off | Off |

| 34% 54C P0 74W / 260W | 0MiB / 48571MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 2 Quadro RTX 8000 Off | 00000000:82:00.0 Off | Off |

| 34% 49C P0 73W / 260W | 0MiB / 48571MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 3 Quadro RTX 8000 Off | 00000000:83:00.0 Off | Off |

| 33% 50C P0 73W / 260W | 0MiB / 48571MiB | 3% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

add user to docker group, and no need to use sudo docker xxx

command refs

sudo nvidia-docker run --rm nvidia/cuda:10.1-base nvidia-smi

sudo nvidia-docker -t -i --privileged nvidia/cuda bash

sudo docker run -it --name kzl -v /home/kezunlin/workspace/:/home/kezunlin/workspace nvidia/cuda

Reference

History

20190821: created.

Copyright

Post author: kezunlin

Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.


版权声明:本文为weixin_39946239原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。