Merge multiple sparse MT to one sparse MT

jinasong · September 14, 2020, 8:37pm

Hello,

Just in case you did not see my question in the reply in my other post, I ask it again here.

I updated Hail, as of version 0.2.56 and tested run_combiner() function with 100, 1k, and 10k gvcf files each (average size : 6G). Run_combiner() runs for 100 gvcfs and 1k gvcfs completed successfully through multiple attempts of failed subtasks showing similar messages as before. Run time in a new version was faster than in the previous Hail version. Thanks much for your and your team’s work.

The job for 10k gvcfs has been failed. The first round of 100 batches, merging 100 gvcfs to a sparse MT, completed successfully, but it failed when starting the second round with the error message as below. If you let me know how to resolve this issue, I will really appreciate it.

– Caused by: java.io.IOException: All datanodes [DatanodeInfoWithStorage[*******,DISK]] are bad

And, I found out 100 sparse MTs generated by the first round in a run_combiner() run in my temp storage.
Is it possible to combine 100 MTs to 1 MT with any other Hail function?

Thank you.
-Jina

tpoterba · September 15, 2020, 3:43pm

Thanks very much for your patience on this. Could you share your call the the combiner? What temporary directory are you using? The “data nodes are bad” error might mean that nodes are full, I think that’s the exception HDFS throws when it runs out of storage space.

jinasong · September 15, 2020, 6:49pm

Hi Tim @tpoterba,

Thanks for your reply. I have attached a Python script to call the run_combiner function and my commands to start a cluster and to submit a job. I hope it would helpful for solving my issue.

By the way, “out of storage space” you mentioned, where does it mean and how can I increase it?

-Jina

< Cluster on GCP>

my-auto-policy [myauto] : max primary : 10 , max secondary : 1000

hailctl dataproc start [mycluster] --vep GRCh38 --labels=mt=hm8-10k --autoscaling-policy=[myauto] --master-machine-type=n1-highmem-8 --worker-machine-type=n1-highmem-8 --properties=dataproc:dataproc.logging.stackdriver.enable=true,dataproc:dataproc.monitoring.stackdriver.enable=true

< Job >

hailctl dataproc submit [mycluster] run_gvcf_combiner.py

< Function call in run_gvcf_combiner.py >

output_file = ‘gs://[my_bucket]/[MT_folder]/10k_20200817.mt’ # output destination
temp_bucket = ‘gs://[my_bucket]/[temp_folder]/’ # bucket for storing intermediate files

hl.experimental.run_combiner(inputs, out_file=output_file, tmp_path=temp_bucket, branch_factor=100, batch_size=100, reference_genome=‘GRCh38’, use_genome_default_intervals=True)

chrisvittal · September 15, 2020, 7:50pm

Is that the full autoscaling policy? What’s the workerConfig minInstances?

jinasong · September 15, 2020, 8:20pm

You can see my autoscaling policy here.

workerConfig:
maxInstances: 10
minInstances: 2
weight: 1
secondaryWorkerConfig:
maxInstances: 1000
weight: 1
basicAlgorithm:
cooldownPeriod: 2m
yarnConfig:
– scaleUpFactor: 1.0
– scaleDownFactor: 1.0
– gracefulDecommissionTimeout: 120s

jinasong · September 21, 2020, 4:40pm

Hi @tpoterba and @chrisvittal,

If you let me know how to solve the “out of storage space” error or(and) how to merge multiple sparse matrix tables to one sparse matrkx table, I will really appreciate it.

I am looking forward to your reply.

-Jina

Topic		Replies	Views
How to merge two or more sparse MT into one joint-called sparseMT? Hail Query & hailctl	3	525	October 5, 2022
Turning run_combiner() performance for Hail local mode Hail Query & hailctl	2	532	November 2, 2021
MethodTooLargeException when running vds.combiner Hail Query & hailctl	2	98	June 30, 2024
VCF Combiner Error Hail Query & hailctl	2	454	April 5, 2021
"lost node" failures when running hl.experimental.run_combiner() Hail Query & hailctl	15	2554	September 8, 2020

Merge multiple sparse MT to one sparse MT

Related topics