MethodTooLargeException when running vds.combiner

Hello

I am experimenting with hail for joint calling gvcfs. I am using a standalone spark cluster (master and worker on same node 40core 512g).

I am getting the error below when using the vds.combiner

ERROR: lir exception is.hail.relocated.org.objectweb.asm.MethodTooLargeException: Method too large: __C9354collect_distributed_array_matrix_multi_writer.__m9381split_Switch ()V:

My commands looks like this, gvcfs is a list of about 4500 exome gvcf paths.

import hail as hl
hl.init(master="spark://node005-default:7077", spark_conf={"spark.executor.cores": "4", "spark.executor.memory": "48g", "spark.driver.memory": "20g"})

combiner = hl.vds.new_combiner(
    output_path='/ssd/scratch/dataset.vds',
    temp_path='/ssd/scratch/dataset.tmp',
    gvcf_paths=gvcfs,
    use_exome_default_intervals=True,
    reference_genome='GRCh38'
)
combiner.run()

I am able to get this to run through when using a much smaller number of gvcfs (but much slower than I would expect, maybe related but Im not sure).
I have almost no experience with Hail and Spark but I noticed that the executors were using around 2g each when running, is such a low amount expected?

Thank you for any help or pointers
Jake

Attached is the log
hail-20240628-1617-0.2.131-37a5ba226bae.log (1.2 MB)

I would recommend decreasing the batch size. You can do this using the gvcf_batch_size parameter to new_combiner. Iā€™d set it to 25 to start.

1 Like

Thank you for the reply and suggestion. That indeed gets it to run, but it is making very slow progress, will take over 2 weeks at this rate. Is that expected? I could increase the cluster size but I thought the relatively small number of samples would be no problem.

This process is also only utilizing about 50g out of 500g memory. I would expect the memory needs to be higher. Is it possible I did not configure spark correctly? The UI reports 10 executors with 4 cores and 48g each.

Thank you for your help