Hello
I am experimenting with hail for joint calling gvcfs. I am using a standalone spark cluster (master and worker on same node 40core 512g).
I am getting the error below when using the vds.combiner
ERROR: lir exception is.hail.relocated.org.objectweb.asm.MethodTooLargeException: Method too large: __C9354collect_distributed_array_matrix_multi_writer.__m9381split_Switch ()V:
My commands looks like this, gvcfs is a list of about 4500 exome gvcf paths.
import hail as hl
hl.init(master="spark://node005-default:7077", spark_conf={"spark.executor.cores": "4", "spark.executor.memory": "48g", "spark.driver.memory": "20g"})
combiner = hl.vds.new_combiner(
output_path='/ssd/scratch/dataset.vds',
temp_path='/ssd/scratch/dataset.tmp',
gvcf_paths=gvcfs,
use_exome_default_intervals=True,
reference_genome='GRCh38'
)
combiner.run()
I am able to get this to run through when using a much smaller number of gvcfs (but much slower than I would expect, maybe related but Im not sure).
I have almost no experience with Hail and Spark but I noticed that the executors were using around 2g each when running, is such a low amount expected?
Thank you for any help or pointers
Jake
Attached is the log
hail-20240628-1617-0.2.131-37a5ba226bae.log (1.2 MB)