Hi,
I’ve run a script several times using the same input, and in the output ht I sometimes get a different number of files in rows/parts, with some having similar numbering, such as
part-117-100-117-0-91037b67-4f44-b6e0-1b32-dbf5bd72bccb
part-117-100-117-1-c9de1566-7096-fec1-b63c-6f7220bfbab3
but it varies each time. Is this typical to find in the output?
This is (mostly) expected. When Hail writes files per partition in a write
step, we append a random string. This way, if a job dies becomes a zombie (loses contact with driver machine and keeps going) we can schedule another task to write that partition without worrying about trying to write to the same file path. The metadata.json.gz files encode which of these are real, and which are trash.
I say ‘mostly’ because we should clean these up.
OK, thank you for the explanation!