My understanding is that use of hail requires JDK 8 … to avoid changing
my machine’s JDK I would prefer to use hail in a container. I have seen a
dockerfile in the discussions and in a github repo but it doesn’t seem up
to date. In light of this comment:
We will probably support Spark 3.2 when major cloud providers have released Spark 3.2 images in general availability. That said, it’s certainly possible Hail will work out of the box built against 3.2.
it seems the tempo of hail infrastructure upgrades is linked to “major cloud providers”. That’s
a reasonable limitation, but I feel there is a lot of potential for using hail outside of commercial
cloud settings that would be more readily achieved if an endorsed container/dockerfile were
available. If this isn’t in the cards and one should just edit the available dockerfile, let us know.
Thanks for this, I did not know about the dockerhub resource. I just want to be able to work with current hail without downgrading my own infrastructure, e.g., a jdk 17. If there are “development” versions of hail depending on more recent spark and jdk it would be good to know. I don’t spend much time around the github source, maybe I should…
That’s definitely reasonable. We’re using Spark 3.3 these days since that’s what Dataproc is using. I think newer JVMs break our use of sun.misc.Unsafe which is pervasive in Hail.
We should advertise the DockerHub better. Maybe in the installation section.
We’re working towards a fully Sparkless backend. That’ll probably need some polish but it will be ready in the next month or so. You could use that without Spark but you’ll still be stuck with our old JVM dependency. In the longer term, we plan to move off the JVM entirely to generating native code through LLVM which should free you entirely from the JVM. That’s a couple years of work away though.