Hail in a docker

Looking forward to a docker version of Hail.

EDIT: This does not work, use this instead.


It is easy to build one yourself from this Dockerfile:

FROM python:3.6
RUN apt-get update && apt-get install -y \
    openjdk-8-jre \
    && rm -rf /var/lib/apt/lists/* && \
    pip3 --no-cache-dir install hail ipython

Are there plans to publish new versions of Hail as their own docker image? This way if there are any bug fixes or patches, one can find the latest version of it.

@danking I tried using that script and seeing an error:

wm0ea-9ad:hail-ukbb-200k-callset rmunshi$ cat dockerfile
FROM python:3.6

#to install hail
RUN apt-get update && apt-get install -y \
    openjdk-8-jre \
    && rm -rf /var/lib/apt/lists/* && \
    pip3 --no-cache-dir install hail ipython

COPY *.sh /
COPY *.py / 
	
CMD ["/bin/bash"]
wm0ea-9ad:hail-ukbb-200k-callset rmunshi$ docker build .
Sending build context to Docker daemon  113.4MB
Step 1/5 : FROM python:3.6
 ---> c4f7d42f7b89
Step 2/5 : RUN apt-get update && apt-get install -y     openjdk-8-jre     && rm -rf /var/lib/apt/lists/* &&     pip3 --no-cache-dir install hail ipython
 ---> Running in d1f643cfd261
Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main amd64 Packages [187 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [7380 B]
Fetched 8338 kB in 2s (4731 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package openjdk-8-jre
The command '/bin/sh -c apt-get update && apt-get install -y     openjdk-8-jre     && rm -rf /var/lib/apt/lists/* &&     pip3 --no-cache-dir install hail ipython' returned a non-zero code: 100

My apologies, this is the latest one that I use:

FROM python:3.6.9-slim-stretch
# re: mkdir, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199#23
RUN mkdir -p /usr/share/man/man1 && \
    apt-get update && apt-get install -y \
    openjdk-8-jre-headless \
    && rm -rf /var/lib/apt/lists/* && \
    pip3 --no-cache-dir install hail==0.2.37 ipython
ENTRYPOINT ["ipython"]
CMD []

I keep a repository of this docker file on GitHub: https://github.com/danking/docker-hail.

There are plans currently under way to regularly publish a Hail docker image versioned by the Hail PyPI version. We have a minor technical hurdle to address in our continuous deployment system.

thanks Dan! appreciate the quick response.

That worked for me – and when I went to use gcloud (to submit a python script) I realized it’s not part of the installation. Is there another dockerfile with how to add gcloud? or would this work if my base image was a gcloud image?

As things relate to GCP there’s a few points:

  1. Google has a page on installing gcloud/gsutil. I use this (nb: the paths will depend on the installing user):
RUN curl https://sdk.cloud.google.com | bash && \
    echo '. /root/google-cloud-sdk/completion.bash.inc' >> ~/.bashrc && \
    echo '. /root/google-cloud-sdk/path.bash.inc' >> ~/.bashrc
  1. If you need Hail to read files in GCS buckets, then you need the GCS Hadoop connectors. Ben Weisburd wrote a script to install that: curl -sSL broad.io/install-gcs-connector | python3. However,
  2. You need GCP service account keys. You’ll need to arrange for those to be securely mounted into the container at run-time. Unfortunately, Weisburd’s script only works after the keys are present.
  3. A gcloud base image should work if it has python3.6

thanks Dan – Appreciate the piece about GCS connectors, it wouldn’t have occured to me. So, I’ve seen the docs on how to build an image with gcloud. However, when one writes a dockerfile that downloads both gcloud and hail into one image, it doesn’t seem to work. I was hoping to find a way to build hail and gcloud into one image.

So if I try to add the gcloud install into a dockerfile with the hail install (*minor modification to your original – I’m excluding ipython), it fails with a curl issue – and I can work around it with a different base image that has curl.

wm0ea-9ad:hail-ukbb-200k-callset rmunshi$ cat dockerfile 
FROM python:3.6.9-slim-stretch

# re: mkdir, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199#23

RUN mkdir -p /usr/share/man/man1 && \
    apt-get update && apt-get install -y \
    openjdk-8-jre-headless \
    && rm -rf /var/lib/apt/lists/* && \
    pip3 --no-cache-dir install hail==0.2.37

COPY *.sh /
COPY *.py / 

RUN curl https://sdk.cloud.google.com | bash && \
    echo '. /root/google-cloud-sdk/completion.bash.inc' >> ~/.bashrc && \
    echo '. /root/google-cloud-sdk/path.bash.inc' >> ~/.bashrc

ENTRYPOINT ["/bin/bash"]
wm0ea-9ad:hail-ukbb-200k-callset rmunshi$ docker build .
Sending build context to Docker daemon  113.4MB
Step 1/6 : FROM python:3.6.9-slim-stretch
 ---> fa79f489b3bf
Step 2/6 : RUN mkdir -p /usr/share/man/man1 &&     apt-get update && apt-get install -y     openjdk-8-jre-headless     && rm -rf /var/lib/apt/lists/* &&     pip3 --no-cache-dir install hail==0.2.37
 ---> Using cache
 ---> 2fb86c0ef970
Step 3/6 : COPY *.sh /
 ---> Using cache
 ---> e648275fbc6f
Step 4/6 : COPY *.py /
 ---> Using cache
 ---> 32cf651428d8
Step 5/6 : RUN curl https://sdk.cloud.google.com | bash &&     echo '. /root/google-cloud-sdk/completion.bash.inc' >> ~/.bashrc &&     echo '. /root/google-cloud-sdk/path.bash.inc' >> ~/.bashrc
 ---> Running in 8763102e47ad
/bin/sh: 1: curl: not found
Removing intermediate container 8763102e47ad
 ---> a9ac2222b211
Step 6/6 : ENTRYPOINT ["/bin/bash"]
 ---> Running in cc3a8630831e
Removing intermediate container cc3a8630831e
 ---> fdc807240e93
Successfully built fdc807240e93

2nd attempt using the non-slim image of python as the base image, though that can’t be used for installing hail (error below).

wm0ea-9ad:hail-ukbb-200k-callset rmunshi$ cat dockerfile 
FROM python:3.6.9

# re: mkdir, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199#23

RUN mkdir -p /usr/share/man/man1 && \
    apt-get update && apt-get install -y \
    openjdk-8-jre-headless \
    && rm -rf /var/lib/apt/lists/* && \
    pip3 --no-cache-dir install hail==0.2.37

COPY *.sh /
COPY *.py / 

RUN curl https://sdk.cloud.google.com | bash && \
    echo '. /root/google-cloud-sdk/completion.bash.inc' >> ~/.bashrc && \
    echo '. /root/google-cloud-sdk/path.bash.inc' >> ~/.bashrc

ENTRYPOINT ["/bin/bash"]
wm0ea-9ad:hail-ukbb-200k-callset rmunshi$ docker build .
Sending build context to Docker daemon  113.4MB
Step 1/6 : FROM python:3.6.9
 ---> 5bf410ee7bb2
Step 2/6 : RUN mkdir -p /usr/share/man/man1 &&     apt-get update && apt-get install -y     openjdk-8-jre-headless     && rm -rf /var/lib/apt/lists/* &&     pip3 --no-cache-dir install hail==0.2.37
 ---> Running in aebcef96cf41
Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main amd64 Packages [187 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [7380 B]
Fetched 8338 kB in 2s (5538 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Package openjdk-8-jre-headless is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'openjdk-8-jre-headless' has no installation candidate
The command '/bin/sh -c mkdir -p /usr/share/man/man1 &&     apt-get update && apt-get install -y     openjdk-8-jre-headless     && rm -rf /var/lib/apt/lists/* &&     pip3 --no-cache-dir install hail==0.2.37' returned a non-zero code: 100

Hey! Sorry about the long delay here. I had some bad notifications settings.

It looks like python:3.6.9 is based on Debian 10.3 Buster

docker run -it python:3.6.9 /bin/sh -c 'apt-get update && apt-get install -y lsb-release && lsb_release -a'
...
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

Debian 10.3 apparently removed JDK 8 for security reasons, I have not dug into the exact reasoning. You might try one of the other python:3.6.9 images, such as stretch:

docker run -it python:3.6.9-stretch /bin/sh -c 'apt-get update && apt-get install openjdk-8-jre-headless'

You could also try using wget or downloading Weisburd’s outside of docker and placing into the Docker build context, COPYing it in, and calling it with bash.

Following this thread, is there an idea if we can submit hail scripts in Terra’s google cloud interface? Right now it seems that we can only use the notebook interphase rather than making it part of a workflow.