Docker, NVIDIA Docker, DIGITS 安装备忘录
前提条件的环境
-
- OS: Ubuntu 16.04
- CUDA Driver, Toolkit インストール済み
步骤
安装Docker
创建以下脚本并以sudo身份运行。
$ cat ./nvidia-docker-setup_1604.sh
#! /bin/sh
# Install Docker
apt-get update
apt-get install apt-transport-https ca-certificates
apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" >> /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
apt-get install docker-engine
docker run hello-world
# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi
$ chmod +x ./nvidia-docker-setup_1604.sh
$ sudo ./nvidia-docker-setup_1604.sh
请参考链接(http://qiita.com/ksasaki/items/bd85786171424901b27d)。
启动容器
nvtest@WT72:~$ sudo nvidia-docker run --name digits -d -p 8080:34448 nvidia/digits
Using default tag: latest
latest: Pulling from nvidia/digits
862a3e9af0ae: Already exists
6498e51874bf: Already exists
159ebdd1959b: Already exists
0fdbedd3771a: Already exists
7a1f7116d1e3: Already exists
1a2b8e5c1cb0: Pull complete
f79c18aad824: Pull complete
d750f0e72581: Pull complete
d399aa23f362: Pull complete
f7534fde9b83: Pull complete
ab6e25a40827: Pull complete
ef0932bdd7af: Pull complete
6616cddeb677: Pull complete
37db32ac8c63: Pull complete
Digest: sha256:be653fe4642928b584f44d03b206f6ddb433508c82b425d95ce6d277daa3462e
Status: Downloaded newer image for nvidia/digits:latest
7f0cb8c00720ce141b7614831f194707926dc6814250f17a722fc3ecb414961d
nvtest@WT72:~$
无法从容器外访问的问题。
确认现象
nvtest@WT72:~$ sudo nvidia-docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7f0cb8c00720 nvidia/digits "./digits-server" 5 minutes ago Up 5 minutes 0.0.0.0:8080->34448/tcp digits
052d94824a82 nvidia/cuda "/bin/bash" 37 minutes ago Up 37 minutes awesome_brown
nvtest@WT72:~$ sudo docker exec -it digits /bin/bash
root@7f0cb8c00720:/usr/share/digits#
root@7f0cb8c00720:/usr/share/digits#
root@7f0cb8c00720:/usr/share/digits# ping google.com
^C
root@7f0cb8c00720:/usr/share/digits# exit
exit
nvtest@WT72:~$
无论等待多久,ping都得不到回应。
解决方法(适用于Ubuntu 15及以上版本)
首先,查找系统当前正在使用的DNS服务器的IP地址。
如果Ubuntu的版本大于或等于15的情况
使用中文将以下内容进行释义,只需要一种选择: nmcli设备显示<接口名称> | grep IP4.DNS
nmcli设备显示<接口名称>,并在结果中搜索IP4.DNS。
nvtest@WT72:~$ nmcli device show enp4s0 | grep IP4.DNS
IP4.DNS[1]: xx.xx.xx.aa
IP4.DNS[2]: xx.xx.xx.bb
IP4.DNS[3]: xx.xx.xx.xx
IP4.DNS[4]: xx.xx.xx.xx
編輯Docker設定檔案。
$ vi /etc/default/docker
$ cat /etc/default/docker
# Docker Upstart and SysVinit configuration file
#
# THIS FILE DOES NOT APPLY TO SYSTEMD
#
# Please see the documentation for "systemd drop-ins":
# https://docs.docker.com/engine/articles/systemd/
#
# Customize location of Docker binary (especially for development testing).
#DOCKERD="/usr/local/bin/dockerd"
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--dns xx.xx.xx.aa --dns xx.xx.xx.bb"
# If you need Docker to use an HTTP proxy, it can also be specified here.
#export http_proxy="http://127.0.0.1:3128/"
# This is also a handy place to tweak where Docker's temporary files go.
#export TMPDIR="/mnt/bigdrive/docker-tmp"
$
从Ubuntu 15开始,由于采用了systemd进行配置,因此需要以下内容。参考网址(http://blog.benhall.me.uk/2015/07/setting-dockers-docker_opts-on-ubuntu-15-04/)。
$ cat /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd://
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=multi-user.target
$ sudo vi /lib/systemd/system/docker.service
$ cat /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
EnvironmentFile=/etc/default/docker
ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=multi-user.target
nvtest@WT72:~$
nvtest@WT72:~$ sudo systemctl daemon-reload
nvtest@WT72:~$ sudo service docker restart
对策(适用于Ubuntu 14以前)
如果Ubuntu的版本是14或更低
用中文原生语言将以下内容改写:仅需要一种选项:
通过「<接口名称>」查询IP4的nmcli设备列表 | grep IP4
$ nmcli dev list iface eth0 | grep IP4
IP4.ADDRESS[1]: ip = xx.xx.xx.xx/23, gw = xx.xx.xx.xx
IP4.DNS[1]: xx.xx.xx.aa
IP4.DNS[2]: xx.xx.xx.bb
IP4.DNS[3]: xx.xx.xx.xx
IP4.DNS[4]: xx.xx.xx.xx
IP4.DOMAIN[1]: xxxxxx.com
IP4.WINS[1]: xx.xx.xx.xx
IP4.WINS[2]: xx.xx.xx.xx
IP4.WINS[3]: xx.xx.xx.xx
编辑Docker配置文件。
$ vi /etc/default/docker
$ cat /etc/default/docker
# Docker Upstart and SysVinit configuration file
#
# THIS FILE DOES NOT APPLY TO SYSTEMD
#
# Please see the documentation for "systemd drop-ins":
# https://docs.docker.com/engine/articles/systemd/
#
# Customize location of Docker binary (especially for development testing).
#DOCKERD="/usr/local/bin/dockerd"
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--dns xx.xx.xx.aa --dns xx.xx.xx.bb"
# If you need Docker to use an HTTP proxy, it can also be specified here.
#export http_proxy="http://127.0.0.1:3128/"
# This is also a handy place to tweak where Docker's temporary files go.
#export TMPDIR="/mnt/bigdrive/docker-tmp"
$
$ sudo service docker restart
下载 mnist 数据集
nvtest@WT72:~$ sudo nvidia-docker restart digits
nvtest@WT72:~$ sudo nvidia-docker exec -it digits /bin/bash
root@7f0cb8c00720:/usr/share/digits#
root@7f0cb8c00720:/usr/share/digits#
root@0fa423b2a7ee:/usr/share/digits# ./tools/download_data/main.py mnist /opt/mnist
Downloading url=http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ...
Downloading url=http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ...
Downloading url=http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz ...
Downloading url=http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ...
Uncompressing file=train-images-idx3-ubyte.gz ...
Uncompressing file=train-labels-idx1-ubyte.gz ...
Uncompressing file=t10k-images-idx3-ubyte.gz ...
Uncompressing file=t10k-labels-idx1-ubyte.gz ...
Reading labels from /opt/mnist/train-labels.bin ...
Reading images from /opt/mnist/train-images.bin ...
Reading labels from /opt/mnist/test-labels.bin ...
Reading images from /opt/mnist/test-images.bin ...
Dataset directory is created successfully at '/opt/mnist'
Done after 130.443932056 seconds.
root@0fa423b2a7ee:/usr/share/digits# ll /opt/mnist/
total 65024
drwxr-xr-x 4 root root 4096 Sep 17 17:18 ./
drwxr-xr-x 3 root root 4096 Sep 17 12:49 ../
-rw-r--r-- 1 root root 1648877 Sep 17 17:18 t10k-images-idx3-ubyte.gz
-rw-r--r-- 1 root root 4542 Sep 17 17:18 t10k-labels-idx1-ubyte.gz
drwxr-xr-x 12 root root 4096 Sep 17 17:18 test/
-rw-r--r-- 1 root root 7840016 Sep 17 17:18 test-images.bin
-rw-r--r-- 1 root root 10008 Sep 17 17:18 test-labels.bin
drwxr-xr-x 12 root root 4096 Sep 17 17:18 train/
-rw-r--r-- 1 root root 9912422 Sep 17 17:18 train-images-idx3-ubyte.gz
-rw-r--r-- 1 root root 47040016 Sep 17 17:18 train-images.bin
-rw-r--r-- 1 root root 28881 Sep 17 17:18 train-labels-idx1-ubyte.gz
-rw-r--r-- 1 root root 60008 Sep 17 17:18 train-labels.bin
root@0fa423b2a7ee:/usr/share/digits#
root@0fa423b2a7ee:/usr/share/digits# ll /opt/mnist/train
total 3524
drwxr-xr-x 12 root root 4096 Sep 17 17:18 ./
drwxr-xr-x 4 root root 4096 Sep 17 17:18 ../
drwxr-xr-x 2 root root 163840 Sep 17 17:18 0/
drwxr-xr-x 2 root root 208896 Sep 17 17:18 1/
drwxr-xr-x 2 root root 167936 Sep 17 17:18 2/
drwxr-xr-x 2 root root 172032 Sep 17 17:18 3/
drwxr-xr-x 2 root root 167936 Sep 17 17:18 4/
drwxr-xr-x 2 root root 147456 Sep 17 17:18 5/
drwxr-xr-x 2 root root 147456 Sep 17 17:18 6/
drwxr-xr-x 2 root root 192512 Sep 17 17:18 7/
drwxr-xr-x 2 root root 159744 Sep 17 17:18 8/
drwxr-xr-x 2 root root 163840 Sep 17 17:18 9/
-rw-r--r-- 1 root root 20 Sep 17 17:18 labels.txt
-rw-r--r-- 1 root root 1860000 Sep 17 17:18 train.txt
root@0fa423b2a7ee:/usr/share/digits#
root@0fa423b2a7ee:/usr/share/digits# ll /opt/mnist/test
total 700
drwxr-xr-x 12 root root 4096 Sep 17 17:18 ./
drwxr-xr-x 4 root root 4096 Sep 17 17:18 ../
drwxr-xr-x 2 root root 36864 Sep 17 17:18 0/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 1/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 2/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 3/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 4/
drwxr-xr-x 2 root root 32768 Sep 17 17:18 5/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 6/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 7/
drwxr-xr-x 2 root root 36864 Sep 17 17:18 8/
drwxr-xr-x 2 root root 32768 Sep 17 17:18 9/
-rw-r--r-- 1 root root 20 Sep 17 17:18 labels.txt
-rw-r--r-- 1 root root 300000 Sep 17 17:18 test.txt
root@0fa423b2a7ee:/usr/share/digits#
通过Web浏览器访问DIGITS并将下载的mnist数据指定为训练数据。