Pausing/Stopping Writing a Matrix Table then Resuming

jwillett · July 13, 2023, 11:43pm

I am using a platform that may error out during the writing of an expensive operation. When running matrix_table.write(destination), will previous progress be recognized if the operation has to be interrupted for any reason? What about if I am writing to a bgen format, ± in parallel option?

danking · July 17, 2023, 2:04pm

Yeah, this is a huge headache. We’ve taken great pains to automatically detect and retry transient errors, but we’ll never catch them all.

The short answer is that we’re currently working on this feature but it won’t be ready soon. In the meantime, you can:

mt.write(..., stage_locally=True)

Which will write the files to the local filesystem of the workers before copying them remotely. This can help when the error frequency is related to the age of the remote connection.

The most advanced users of Hail are probably the gnomAD team. My understanding is they tend to perform just one expensive operation per-write. So, for example, they’ll do one round of variant QC, write just the variant metadata, then load that data back to perform the next round. The use of a write/read in between expensive rounds of QC limits the cost of transient failures.

Can you share some of the errors you’re encountering? We can add them to the list of automatically retried transient network errors.

jwillett · July 17, 2023, 2:51pm

It was an error with the export_bgen that I posted here: Write Bgen Fatal Error - #4 by danking.

I’ll just write to an intermediate file as it does not sound like writing is a particularly expensive operation.

Topic		Replies	Views
Write Bgen Fatal Error Hail Query & hailctl	3	434	July 17, 2023
Why shouldn't `MatrixTable.write` be used with BGEN files? Hail Query & hailctl	1	388	January 30, 2023
Long Stage after Writing without Terminating Hail Query & hailctl	1	291	July 24, 2023
SSLException: connection reset during matrixtable.write Hail Query & hailctl	2	673	June 6, 2019
Small MatrixTable hangs on write into Google bucket Hail Query & hailctl	13	911	September 5, 2019

Pausing/Stopping Writing a Matrix Table then Resuming

Related topics