Hi all,
I’ve been playing around with DataBricks as a potential variant store solution for our team at the University of Melbourne Centre for Cancer Research.
With project glow not being updated for a while, it is out of date with the latest DataBricks runtime versions that use later versions of spark. This means in-turn it cannot access more recent features of DataBricks such as Unity Catalog.
With the 3.4 Spark / Hail compatibility issue showing that hail can easily support spark 3.4 with a few small changes, means that we can use the latest LTS runtime for DataBricks (13.3-LTS) with hail!
I’ve managed to build a docker container backed onto the latest version of DataBricks. See my docker-who repo for more info.
I’m also looking into Unity Volumes for non-tabular data (like MatrixTables). Unfortunately Unity Volumes are still in public preview and require a bit of file copy manipulation to work with hail (for now).
I’m excited to see if Unity Volumes do support hail read/write directly soon.
I will continue to post my findings to this topic for those interested in using hail on DataBricks.
Links posted in comments below!
Alexis