Heaton Research

AWS EC2 Data Science: My Jupyter Workspaces for Docker CPU Only

This blog post describes how I created a CPU-only Jupyterhub instance to be run from Docker in AWS. The instance created from these instructions can exist on a variety of EC2 machines. The pricing information can be found here. You might want to start with the following three instance types:

  • t2.large: 8GB memory, plenty for install and testing
  • t2.xlarge: 16 GB, more memory
  • t2.2xlarge: 32 GB, even more memory

The first step is to install docker. Create your instance and install the basic Amazon Linux AMI. A few important points on setting up your instance:

  • Your login user will be ec2-user, you will be given a private key that will be user to login with this user.
  • Your security group should expose the following ports: 22 (for SSH login), 80 (for normal HTTP), and 443 for HTTPS.

Once you log into your new instance, update all software with the following command:

1
sudo yum update

Install docker, with the following command:

1
sudo yum install docker

Create a directory to hold your docker files to build. Obtain my docker file for Jupyter hub and the supporting Python script.

Build the docker image named hri-jupyter-hub with the following command:

1
docker build -t hri-jupyter-hub --no-cache .

You will probably want to make some small changes to the Dockerfile, shown here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
FROM jupyterhub/jupyterhub:latest

COPY jupyterhub_config.py /srv/jupyterhub/jupyterhub_config.py
RUN useradd -ms /bin/bash jtheaton
RUN echo jtheaton:temppwd123 | chpasswd
RUN chown jtheaton /home/jtheaton
RUN conda install scipy
RUN pip install notebook
RUN pip install sklearn
RUN pip install pandas
RUN pip install pandas-datareader
RUN pip install matplotlib
RUN pip install pillow
RUN pip install requests
RUN pip install h5py
RUN pip install gensim
RUN pip install tensorflow==1.3.0
RUN pip install keras==2.0.8
EXPOSE 80

This image requires that two volumes be mapped:

  • -v/home/ec2-user/projects/ - This is where your projects will reside, these are the notebooks you will create. I have mine set to /home/jtheaton.
  • -v/home/ec2-user/ssl/ - This is where your SSL certificates will reside. I set mine to /etc/ssl/:ro**.

To actually start up the docker image you have several options. Ultimately you will want to start the Jupyter notebook to always run in the background and start when the instance is booted. This command will accomplish that:

1
docker run -dit --restart unless-stopped --name jupyterhub -p443:8000 -v/home/ec2-user/projects/:/home/jtheaton -v/home/ec2-user/ssl/:/etc/ssl/:ro hri-jupyter-hub

A few complications here. I use an HTTP cert. I bought one from namecheep, but that is actually unnecessary. On my next pass at this I am going to use: https://letsencrypt.org/

Those -v‘s up there are mappings from my AWS instance into Docker. That way docker can modify these files on the host and if I regenerate my Docker image, nothing gets destroyed. My mappings are:

/home/ec2-user/projects/:/home/jtheaton
/home/ec2-user/ssl/:/etc/ssl/:ro

The left side of : is my AWS host directory, the right is inside the docker image. I create a user in docker named jtheaton, which is what I log into Jupyter hub from.

The first directory is where all my Jupyter notebooks go. The second is where my SSL certificates go (read only RO). You should also notice the -p443:8000 option. This specifies to map the Docker internal port of 8000 to external port of 443. Port 8000 is the default Jupyter hub port and 443 is the HTTPS port that we are running Jupyter on. Because I am using a SSL cert, I should make use of 443 instead of 80.

You should change the password of the user that you modified the Docker file above to create. My example script created a user named jtheaton with a temporary password of temppwd123. I do not recommend hard coding a real password into a Docker file. To change your user’s password, you must connect to the console of the running Docker image, with the following command:

1
docker exec -i -t 133a4789ef57 /bin/bash

The value 133a4789ef57 is the Docker container id. This can be obtained by running docker ps. Once connected to the container you will be at a command prompt as root. To change the user’s password, simply run the following command.

1
passwd jtheaton

Of course, you should replace jtheaton with your user. I also assume that you map a URL to your IP address, such as jupyter.heatonresearch.com. I access my Jupyter server with the URL of http://jupyter.heatonresearch.com.