I’ve run a script several times using the same input, and in the output ht I sometimes get a different number of files in rows/parts, with some having similar numbering, such as
but it varies each time. Is this typical to find in the output?
This is (mostly) expected. When Hail writes files per partition in a
write step, we append a random string. This way, if a job dies becomes a zombie (loses contact with driver machine and keeps going) we can schedule another task to write that partition without worrying about trying to write to the same file path. The metadata.json.gz files encode which of these are real, and which are trash.
I say ‘mostly’ because we should clean these up.
OK, thank you for the explanation!