VDS Combiner Restart from checkpoint

MsUTR · August 9, 2023, 1:15pm

Hello all! I am interested to use hl.vds.new_combiner to joint call around 15k samples. As this is a relatively large dataset, I would like to seek clarification on two points:

For restarting from a failed execution, can I check if how it works is to just define a save_path in hl.vds.new_combiner, and then just running the combiner with the same arguments again to resume from the checkpoint?
The maximum time for the jobs at my institution is 14 days. Hence, when the job stops (i.e. manual force stop of the job), can the joint calling be resumed with a new job from the save_path by running the same function again?

Really appreciate your insights!

chrisvittal · August 9, 2023, 9:59pm

Both of these are correct, the combiner’s state it saved between executions. If you run the same script with the same version of hail and don’t pass force=True to new_combiner, then it will reuse the same plan and pick up where it left off.

MsUTR · August 10, 2023, 4:21pm

Thank you very much for your response @chrisvittal ! I would like to seek your advice on another matter. I am running it on a small subset of 1000 samples, and I already ran into this issue:

Error summary: FileNotFoundException: ./hail_tmp/combiner-intermediates/e431b2db-2725-44a6-a869-335c29e76d53_gvcf-combine_job1/dataset_0.vds/reference_data/index/part-26-0-26-0-0ce32f55-820d-7ea9-42c9-9b24d305c8ef.idx/metadata.json.gz (Too many open files)

I am inferring that this is due to having too many partitions (~8000 partitions at the Stage where it failed). I am using WES samples and hence I ticked use_exome_default_intervals=True. However, to rectify the above issue, would you recommend to increase the interval size with import_interval_size? Thank you very much!

Topic		Replies	Views
Merge multiple sparse MT to one sparse MT Hail Query & hailctl	5	365	September 21, 2020
Turning run_combiner() performance for Hail local mode Hail Query & hailctl	2	453	November 2, 2021
Filtering samples from VDS in Google cloud Hail Batch & General Cloud	8	217	January 23, 2024
Empty matrix table with vcf_combiner.run_combiner Hail Query & hailctl	0	377	June 4, 2021
Room for improvement when joining multiple HTs? Hail Query & hailctl	7	499	November 23, 2021

VDS Combiner Restart from checkpoint

Related Topics