Heaton Research

AWS EC2 Data Science: My Jupyter Workspaces for GPU

This blog post describes how I created a GPU Jupyterhub instance to use with Tensorflow in AWS. The instance created from these instructions can exist on a variety of EC2 machines. The pricing information can be found here. You might want to start with the following three instance types:

  • p2.xlarge: Single high performance GPU.
  • So far, I have not experimented with other GPU EC2 instances.

This a functional, but not ideal, setup for AWS GPU in TensorFlow. Ideally, I would like to use NVIDIA Docker. However, I have not yet gotten Docker to work with GPU’s, so this approach installs directly to the VM.

Installing GPU drives can be very complex. You really have three layers that you are dealing with, and all must be using compatable versions:

  • CUDA GPU Drivers - The low level drivers to allow access to the compute capabilities of your card.
  • CUDNN Drivers - Special library to use CUDA with deep learning neural networks (DNN).

The method that I used was to start from a premade Amazon AMI for TensorFlow and CUDA. The latest information on this AMI can be found here:

Create your instance and install the Deep Learning AMI with Source Code (CUDA 8, Amazon Linux) AMI. A few important points on setting up your instance:

  • Your login user will be ec2-user, this is also the user that has access to the Anaconda Python installation.
  • The root user does not have access to the Anaconda Python install. Because of this, you cannot run JupyterHub from root. This is first on my list to fix the next time I build a GPU VM.

Once you log into your new instance, update all software with the following command:

sudo yum update

Next install JupyterHub:

sudo yum install jupyterhub

Once JupyterHub is launched, it will listen on port 4000 by default. It would be better to move this to port 80 or port 443. Because we are running the service as ec2-user and not root, there is really no way to allow a non-root process to listen on either of these ports. To accomplish this, firewall routing should be used (you can find a good description of the process here):

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8000

Of course, make sure to save your changes to the iptables:

iptables-restore < /etc/iptables.conf

The add the following command in /etc/rc.local to reload the rules in every reboot.

iptables-restore < /etc/iptables.conf

You can now start JupyterHub with the following:

nohup jupyterhub &

You will log in to your server with “ec2-user” as your password. Make sure to assign **ec2-user” a password.