MethodTooLargeException when running vds.combiner

JakeHagen · June 28, 2024, 8:42pm

Hello

I am experimenting with hail for joint calling gvcfs. I am using a standalone spark cluster (master and worker on same node 40core 512g).

I am getting the error below when using the vds.combiner

ERROR: lir exception is.hail.relocated.org.objectweb.asm.MethodTooLargeException: Method too large: __C9354collect_distributed_array_matrix_multi_writer.__m9381split_Switch ()V:

My commands looks like this, gvcfs is a list of about 4500 exome gvcf paths.

import hail as hl
hl.init(master="spark://node005-default:7077", spark_conf={"spark.executor.cores": "4", "spark.executor.memory": "48g", "spark.driver.memory": "20g"})

combiner = hl.vds.new_combiner(
    output_path='/ssd/scratch/dataset.vds',
    temp_path='/ssd/scratch/dataset.tmp',
    gvcf_paths=gvcfs,
    use_exome_default_intervals=True,
    reference_genome='GRCh38'
)
combiner.run()

I am able to get this to run through when using a much smaller number of gvcfs (but much slower than I would expect, maybe related but Im not sure).
I have almost no experience with Hail and Spark but I noticed that the executors were using around 2g each when running, is such a low amount expected?

Thank you for any help or pointers
Jake

Attached is the log
hail-20240628-1617-0.2.131-37a5ba226bae.log (1.2 MB)

chrisvittal · June 28, 2024, 10:21pm

I would recommend decreasing the batch size. You can do this using the gvcf_batch_size parameter to new_combiner. I’d set it to 25 to start.

JakeHagen · June 30, 2024, 4:54pm

Thank you for the reply and suggestion. That indeed gets it to run, but it is making very slow progress, will take over 2 weeks at this rate. Is that expected? I could increase the cluster size but I thought the relatively small number of samples would be no problem.

This process is also only utilizing about 50g out of 500g memory. I would expect the memory needs to be higher. Is it possible I did not configure spark correctly? The UI reports 10 executors with 4 cores and 48g each.

Thank you for your help

Topic		Replies	Views
ClassTooLargeException merging many wide vcfs Hail Query & hailctl	2	430	August 31, 2021
Error in calling vcf_combiner Hail Query & hailctl	14	659	July 28, 2021
VDS combiner unsuccesful on large cohort Hail Query & hailctl	2	186	May 22, 2024
Merge multiple sparse MT to one sparse MT Hail Query & hailctl	5	399	September 21, 2020
Turning run_combiner() performance for Hail local mode Hail Query & hailctl	2	532	November 2, 2021

MethodTooLargeException when running vds.combiner

Related topics