I’ve noticed that the documentation for HailContext indicates tmp_dir defaults to /tmp. Is there a way to change this default through a command line parameter or environment variable (it doesn’t seem to recognize TMP, TMPDIR or TMP_DIR)?
I’ve been running hail on a UGE cluster very similar to the Broad’s own cluster, but configured for /scratch/$USER to be the location for temporary files and very little space allocated to /tmp. While debugging some code by using an interactive job and running python/hail from a shell I tried exporting a VDS to VCF and ran into a problem running out of space on /tmp (despite setting /scratch/$USER for the previously mentioned environment variables, SPARK_LOCAL_DIRS, and with _JAVA_OPTIONS=-Djava.io.tmpdir).
I found that using a hardcoded path in the constructor was effective (i.e. hc = HailContext(tmp_dir="/scratch/rca"), but this is obviously not ideal.
Exactly. Since the entire cluster is configured for /scratch to be used as a temporary directory, and /tmp to have limited space, it would be ideal to be able to, for example, set an environment variable in the script that’s called when someone loads the spark module. That way it would “just work”, and individual users wouldn’t need to remember to set it in their script, or even in their .bashrc.