New to hail and trying run_combiner with a small set of input files (<100) on a small spark cluster. This runs, but the combined matrix table that is written seems to be just a skeleton that doesn’t contain the data (it’s only a few hundred KB) and I get an error of the form
Error summary: FileNotFoundException: File out.mt/rows/rows/parts/part-0115-1-115-0-9ea728f3-255e-ea11-92d8-f9eca3fe3045 does not exist
when I try to read it back. Looking in $SPARK_WORKER_DIR on each node, I am able to find out.mt directories that contain gigabytes of data.
Is there something wrong with the spark or hail setup? I have:
-
$SPARK_WORKER_DIR&$SPARK_LOCAL_DIRS: local to each node - For the call to
hl.init,tmp_dirshould be globally visible, butlocal_tmpdiris local. - For the call to
run_combiner,out_fileandtmp_pathshould be globally visible.
Thanks.