Thoughts on making hail compilation more offline friendly?

I’m trying to build hail in an environment where I don’t have access to the public internet, and have run into some issues. There are a few places in the makefile where hail pulls specific versions of header-only libraries (i.e libsimdpp and Catch) from your own private mirrors. I feel like the usual move is to either 1) give users a chance to provide those deps themselves before fetching them on their behalf or 2) just stick them in the release tarball before you cut a release. I get the desire to not put that stuff in version control, but if you aren’t going to let the user opt out of using those specific deps, shouldn’t they be in the release(s)? If I made a pull request would that be of interest to anyone else?

Same goes for elasticsearch, but it might also be cool if I could just build hail without it.

1 Like

Hey @CreRecombinase !

My apologies for your difficulties compiling Hail without access to the public internet!

If you place a libsimdpp-2.1 tar file, catch.hpp, and elasticsearch respectively at: hail/src/main/c/libsimdpp-2.1.tar.gz, hail/src/main/resources/include/catch.hpp, and libs/elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT-custom-hail-spark311.jar, then the Makefiles should recognize those files existence and not try to download them. If that’s not working, then I can look into fixing that. If there are more standard ways to support user-supplied files, I’d be happy to support that too.

Ah, interesting, you’re referring to these GitHub “releases,” right?. Those files are automatically generated by GitHub directly from our repository. AFAIK, I can’t modify those directly. I suppose we could generate a distinct tar/zip with the dependencies included. Does that satisfy your needs?

While I welcome PRs, our CI system is locked down for security reasons. As such, external contributions, particularly to the build system, are difficult.

I will say, I’m mystified as to why we don’t use gradle to fetch the elasticsearch dependency. I’m looking into making that a normal gradle dependency.

May I ask, how do you deal with gradle dependencies in this air-gapped system?

Oh, heh, we provide our own elastic search because there is no publicly supported version of elasticsearch-spark for Spark 3.1.x.

Hi there! We’ve removed the dependency on a special build of elasticsearch-spark. [query] Simplify Elasticsearch Dependencies by johnc1231 · Pull Request #11410 · hail-is/hail · GitHub