Log4j vulnerability in dataproc

Dear Hail team,

Due to the GCP log4j vulnerability (CVE-2021-44228) issue, some Dataproc Image versions are deprecated. I use hail 0.1 with image 1.1-debian9 (w/ spark2.0.2), it works well. however, now 1.1-debian9 is deprecated, and I cannot find a new suitable dataproc image version for my hail 0.1 jar. Does any one know how to solve this? which image could be used now? All helps are welcome and I would really appreciate!

I also considered move to Hail 0.2, but my input is not a VCF but a merged VDS containing all autosomal chrs which already went through all QC steps, all I need now is split it into each chrs and export to VCF. I cannot find a suitable function in Hail0.2 to do this task. (I find: hail.vds.to_dense_mt, but according this description: “Hail 0.1 also had a Variant Dataset class. Although pieces of the interfaces are similar, they should not be considered interchangeable and do not represent the same data.” I am not sure I can use it or not.

Thanks a lot, Shuang

It’s true, there’s nothing in hail 0.2 to import your hail 0.1 VDS. You’ll have to use hail 0.1 to export your VDS to a VCF, then import that VCF with hail 0.2. You might also have to export column annotations as text file and then read them in with hail 0.2.

The hail team no longer supports building 0.1 or figuring out how to get it to work on dataproc. If the image is “deprecated”, you may still be able to use it, but if it’s completely gone, I’m not sure what the best option is. Hail 0.1 code remains available here if it’s useful: GitHub - hail-is/hail at 0.1
You might be able to build hail 0.1 with Spark 2.4.8 and use it on dataproc image 1.5-debian10 if that becomes only way for you to move forward.

1 Like