Building hail in Docker

Hi,
I am currently trying to create a Docker image with a custom version of hail but I am running into problems.
The Dockerfile looks somewhat like this:

    FROM ubuntu:20.10 AS hail
    RUN apt-get update && apt-get install -y wget make g++ curl libopenblas-base liblapack3 liblz4-dev rsync openjdk-8-jre-headless openjdk-8-jdk liblz4-dev python3 python3-pip
    COPY hail/hail /hail
    RUN cd /hail && make clean
    COPY hail/README.md /README.md
    RUN pip install jupyterlab
    RUN cd /hail && make install HAIL_COMPILE_NATIVES=1
    EXPOSE 8888
    CMD ["jupyter", "lab", "--ip='0.0.0.0'", "--port=8888", "--no-browser", "--allow-root"]

When I run a container with the image, I can import hail, but I can’t call hl.init().

    >>> import hail as hl
    >>> hl.init()
    2021-05-11 19:27:35 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<decorator-gen-1763>", line 2, in init
    File "/usr/local/lib/python3.8/dist-packages/hail/typecheck/check.py", line 577, in wrapper
        return __original_func(*args_, **kwargs_)
    File "/usr/local/lib/python3.8/dist-packages/hail/context.py", line 246, in init
        backend = SparkBackend(
    File "/usr/local/lib/python3.8/dist-packages/hail/backend/spark_backend.py", line 171, in __init__
        self._jbackend = hail_package.backend.spark.SparkBackend.apply(
    File "/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py", line 1304, in __call__
        return_value = get_return_value(
    File "/usr/local/lib/python3.8/dist-packages/py4j/protocol.py", line 326, in get_return_value
        raise Py4JJavaError(
    py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.
    : java.lang.ExceptionInInitializerError
            at is.hail.backend.spark.SparkBackend$.createSparkConf(SparkBackend.scala:78)
            at is.hail.backend.spark.SparkBackend$.configureAndCreateSparkContext(SparkBackend.scala:127)
            at is.hail.backend.spark.SparkBackend$.apply(SparkBackend.scala:203)
            at is.hail.backend.spark.SparkBackend.apply(SparkBackend.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
            at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
            at py4j.Gateway.invoke(Gateway.java:282)
            at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
            at py4j.commands.CallCommand.execute(CallCommand.java:79)
            at py4j.GatewayConnection.run(GatewayConnection.java:238)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 12
            at java.lang.String.substring(String.java:1963)
            at is.hail.package$.<init>(package.scala:46)
            at is.hail.package$.<clinit>(package.scala)
            ... 15 more

The image is using java8 and python3.8.6.
Does anyone know, what I am doing wrong?

Are you sure you need to compile Hail? pip install hail should work just fine on Ubuntu.

Unfortunately, yes. I made some adjustments to the Scala code. I can also compile it locally on a Ubuntu system. The only obvious difference there is that I have python 3.7 installed there.

I think your main issue is that there is no git folder in Docker. I think make doesn’t fail if a subcommand used to define a variable fails, so you didn’t see a failure.

If you want to build inside docker but not have the whole hail repo in docker, I’d use a multi-stage build:

FROM ubuntu:20.10 AS build
RUN apt-get update && apt-get install -y wget make g++ curl libopenblas-base liblapack3 liblz4-dev rsync openjdk-8-jre-headless openjdk-8-jdk liblz4-dev python3 python3-pip
COPY hail /hail
RUN cd /hail && \
    make -C hail wheel HAIL_COMPILE_NATIVES=1 && \
    cd hail/build/deploy/dist && \
    tar -cf /wheel-container.tar hail-*-py3-none-any.whl

FROM ubuntu:20.10 AS hail
COPY --from=build /wheel-container.tar /wheel-container.tar
RUN tar -xf /wheel-container.tar && pip install hail-*-py3-none-any.whl
RUN pip install jupyterlab
EXPOSE 8888
CMD ...

The tar nonsense is to avoid explicitly including the version of Hail. pip refuses to install wheel files without the version number in their names.

Thanks for your help! I tried it with the multistage build now, but it still throws the same error.
I think that the HAIL_REVISION is not properly defined, as the error is caused by hail/hail/src/main/scala/is/hail/package.scala:46 :

val HAIL_PRETTY_VERSION = HAIL_PIP_VERSION + "-" + HAIL_REVISION.substring(0, 12)

So I’ll try setting this somehow.

I had to remove .git from the .dockerignore. I also have hail as a submodule of another repository, so I had to include the main repository as well.

Thanks a lot! The git comment saved me :slight_smile:

1 Like