New to hail and trying run_combiner
with a small set of input files (<100) on a small spark cluster. This runs, but the combined matrix table that is written seems to be just a skeleton that doesn’t contain the data (it’s only a few hundred KB) and I get an error of the form
Error summary: FileNotFoundException: File out.mt/rows/rows/parts/part-0115-1-115-0-9ea728f3-255e-ea11-92d8-f9eca3fe3045 does not exist
when I try to read it back. Looking in $SPARK_WORKER_DIR
on each node, I am able to find out.mt
directories that contain gigabytes of data.
Is there something wrong with the spark or hail setup? I have:
-
$SPARK_WORKER_DIR
&$SPARK_LOCAL_DIRS
: local to each node - For the call to
hl.init
,tmp_dir
should be globally visible, butlocal_tmpdir
is local. - For the call to
run_combiner
,out_file
andtmp_path
should be globally visible.
Thanks.