Setting up a machine for deep learning is painful, especially configuring Cuda on the machine. Depends on your requirements, it might cost you hours or days to set up the environment, not to mention to replicate a same environment again.

I have been helping people set up machines to do deep learning research or experiments in our lab many times. In the end, I like to encourage people to use docker. Using docker might be scary at first glance, but its features can handle many of scenarios you might want to have.

This is a series of blog posts in the following three parts and this post is the first post in this series. I will help you

  1. Set up a machine in a one time cost
  2. Configure an nvidia docker
  3. Demo

There are several benefits of using Docker for deep learning.

  • Keep libraries installed on the machine as minimal as possible.

All you need are hardware driver and docker. You don't have to install other things on the machine. If you mess up your environment, you can destroy the old container build a new one. (Thinks about how many time you are struggling with installing python or python library.)

  • Choose the cuda and cuDNN environments as you want.

cuda and cuDNN are not easy to configure. It depends on which deep learning frameworks you are using. By using container, you can easily set up your desirable work environment.

There are still good things not mentioned here. But I want to get into the important part first and mention them later.


Only three things are needed to be installed on the machine. That's all. We can run rest of things in the container in the next post.

  1. NVIDIA Driver (No need for cuda)
  2. Docker CE
  3. NVIDIA Docker

The following script demonstrates you can set up a machine in just a few minutes. An Ubuntu 18.04 LTS machine is used for this demo. However, all the following steps can be migrated to other OS easily.

Last update: Wed Mar 27 2019

Update machine

sudo apt update -y
sudo apt upgrade -y

Install NVIDIA driver

sudo add-apt-repository -y ppa:graphics-drivers/ppa
sudo apt update -y
sudo apt install nvidia-driver-396
sudo reboot

Check the nvidia driver installation status

nvidia-smi

A similar result should be printed on screen.

Wed Mar 27 20:12:08 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    84W / 149W |      0MiB / 11441MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Instacll Docker CE

Check Install Docker CE for other OS.

sudo apt install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt install -y docker-ce docker-ce-cli containerd.io

Post installation for Docker

Use Docker as a non-root user

sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot

Configure Docker to start on boot

sudo systemctl enable docker

Install nvidia-docker

Source

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update -y

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

Check nvidia docker installation status

nvidia-docker version
NVIDIA Docker: 2.0.3
Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:53:11 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

The docker configuration will start from the next post. Once you know how to configure the environment, your deep learning models are ready to be delivered worldwide!