HailContext tmp_dir

Hi,

I’ve noticed that the documentation for HailContext indicates tmp_dir defaults to /tmp. Is there a way to change this default through a command line parameter or environment variable (it doesn’t seem to recognize TMP, TMPDIR or TMP_DIR)?

Thanks,
-rca

The HailContext constructor does not consult any environment variables when choosing the temporary directory.

The subtext here seems to be that you cannot modify your script like this:

import os

hc = HailContext(tmp_dir=os.environ['TMPDIR'] if 'TMPDIR' in os.environ else '/tmp')

Under what circumstances are you running hail?

Hi Dan,

I’ve been running hail on a UGE cluster very similar to the Broad’s own cluster, but configured for /scratch/$USER to be the location for temporary files and very little space allocated to /tmp. While debugging some code by using an interactive job and running python/hail from a shell I tried exporting a VDS to VCF and ran into a problem running out of space on /tmp (despite setting /scratch/$USER for the previously mentioned environment variables, SPARK_LOCAL_DIRS, and with _JAVA_OPTIONS=-Djava.io.tmpdir).

I found that using a hardcoded path in the constructor was effective (i.e. hc = HailContext(tmp_dir="/scratch/rca"), but this is obviously not ideal.

Is the reason it is not ideal because the setting is per-script rather than per shell / environment / machine?

Exactly. Since the entire cluster is configured for /scratch to be used as a temporary directory, and /tmp to have limited space, it would be ideal to be able to, for example, set an environment variable in the script that’s called when someone loads the spark module. That way it would “just work”, and individual users wouldn’t need to remember to set it in their script, or even in their .bashrc.

I created a PR to address this in our development branch. I went with TMPDIR as its in the POSIX standard.

Unfortunately, this is a breaking change to our python interface so I can’t back port this change to 0.1.

One might argue this is actually a bug fix, given that we ignore a standardized environment variable. I’ll poll the community over gitter.

Thanks for making this change :+1: