Parallel R in Docker Containers

Publicado el sábado, 20 de agosto de 2016

At our R User Group the question came up how to use Docker to set up parallel computing of R code on a cluster of Docker containers. And in setting up an example as a proof of concepts I run into more trouble than I had expected. Most of them were networking issues, and these notes may be use to you if you try to set up something similar.

The code of the example cluster I set up is here on Github FvD/ralparque. If you would like to see that translated to English, please let me know. All the corresponding Dockerfiles are there, and I hope that where the code does not speak for itself, this blog-post will explain what happened.

There are a couple of very good post about parallel computing in R and using the snow package do so. The tips from Max Gordon help you decide what approach to use. But since I only wanted to set up a proof of concept, I used the article and the code from Hernan Resnizky to have a working example.

Setup SSH Service for debian:testing

To set up the containers, I used Rocker r-base and that was the cause of the issue that took most time to solve. Rocker uses debian:testing, but the canonical example of setting up a dockerized SSH daemon service to docker containers, required by snow is writing with a FROM ubuntu:14.04. And it turns out the sshd_config file is a little bit different in debian:testing than it is in ubuntu:14:04.

So instead of using sed to change PermitRootLogin without-password as stated in the Docker example code :

RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config

We need to change PermitRootLogin prohibit-password with PermitRootLogin yes as follows:

RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config

Setup SSH Keys

We then also need to set up the keygen not only in the image for the worker containers, but also in the image of the leader container. This was so because otherwise it is not possible to ssh-copy-id the public keys from the worker containers to the leader.

RUN ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa

This is the part that still needs improvement. I can follow the steps to create the leader and the workers and set them up as cluster-members. But for some reason makeCluster will ask for a password to each of the workers, regardless of whether ssh-copy-id run successfully for each worker.

Setup Docker network

To get all the Docker containers to communicate successfully the docker network command was useful

$ sudo docker network create --subnet=172.18.0.0/16 simonbolivar

I would have expected that this would isolate the containers from access other than through this network. But in the current set-up that is not the case yet, perhaps because I have explicitly EXPOSE'd port 22. Another point for refinement.

Full example: Cluster with 3 workers

If you want to walk through the example, and can ignore any language issues, then the following would be the steps to get a leader with three workers running:

  1. Download / clone the project at FvD/ralparque.

  2. set up the Docker network

     $ sudo docker network create --subnet=172.18.0.0/16 simonbolivar
    
  3. Set up the workers

     $ cd manual/artistas
     $ sudo docker build -t artista .
    
     $ sudo docker run -d -P --net simonbolivar --ip 172.18.0.2 --name aterciopelados -it artista
     $ sudo docker run -d -P --net simonbolivar --ip 172.18.0.3 --name fabulosos_cadillacs -it artista
     $ sudo docker run -d -P --net simonbolivar --ip 172.18.0.4 --name mana -it artista
    
  4. Set up the leader

     $ cd manual/organizador
     $ sudo docker build -t organizador .
    
     $ sudo docker run --net simonbolivar --ip 172.18.0.10 --name duarte -it organizador
    

During step four you will be asked whether it is OK to connect to the worker containers and their password which in this example is artista.

I would love to hear from you how this proof of concept can be improved. Or perhaps you can point me in the right direction to have snow::makeCluster connect without user intervention to the workers. I did not find nearly as much recipes to set up a cluster of docker containers for parallel computing with R as I had expected.