Tempo of infrastructure upgrades/containerization

My understanding is that use of hail requires JDK 8 … to avoid changing
my machine’s JDK I would prefer to use hail in a container. I have seen a
dockerfile in the discussions and in a github repo but it doesn’t seem up
to date. In light of this comment:

We will probably support Spark 3.2 when major cloud providers have released Spark 3.2 images in general availability. That said, it’s certainly possible Hail will work out of the box built against 3.2.

it seems the tempo of hail infrastructure upgrades is linked to “major cloud providers”. That’s
a reasonable limitation, but I feel there is a lot of potential for using hail outside of commercial
cloud settings that would be more readily achieved if an endorsed container/dockerfile were
available. If this isn’t in the cards and one should just edit the available dockerfile, let us know.

In terms of Hail-team-endorsed publicly accessible images, there is hailgenetics/hail at DockerHub. That’s based on this Dockerfile in the repo. {{ hail_ubuntu_image.image }} is replaced with an image built from this Dockerfile.

Can you say a bit more about what you need? Are you looking for a Dockerfile with Hail, Spark 3.2, and JDK 11?

Thanks for this, I did not know about the dockerhub resource. I just want to be able to work with current hail without downgrading my own infrastructure, e.g., a jdk 17. If there are “development” versions of hail depending on more recent spark and jdk it would be good to know. I don’t spend much time around the github source, maybe I should…

1 Like

That’s definitely reasonable. We’re using Spark 3.3 these days since that’s what Dataproc is using. I think newer JVMs break our use of sun.misc.Unsafe which is pervasive in Hail.

We should advertise the DockerHub better. Maybe in the installation section.

We’re working towards a fully Sparkless backend. That’ll probably need some polish but it will be ready in the next month or so. You could use that without Spark but you’ll still be stuck with our old JVM dependency. In the longer term, we plan to move off the JVM entirely to generating native code through LLVM which should free you entirely from the JVM. That’s a couple years of work away though.

Hi, I have not been able to watch the evolution of the system so I wonder if I can get an update on these remarks. Any pointers to how to employ the sparkless backend at this time would be welcome.

Additionally my deployment of hail 0.2.108 on ubuntu systems for intel x86 works fine but fails on ARM linux. Do you have any experience with that platform?