Error summary: ClassTooLargeException: Class too large

MsUTR · July 14, 2023, 8:30pm

Hi all! I know this has been posted a few times, but I do not think that any of the solutions are applicable to this. I am trying to import a gnomAD VCF (gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz) with a simple chunk of code:

recode = {f"{i}":f"chr{i}" for i in (list(range(1, 23)) + ['X', 'Y'])}
mt = hl.import_vcf("./gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz", force_bgz=True, reference_genome='GRCh38', contig_recoding=recode)
mt.write('gnomad.mt', overwrite = True)

But I ran into this issue:

Hail version: 0.2.108-fc03e9d5dc08
Error summary: ClassTooLargeException: Class too large: __C30721collect_distributed_array_matrix_native_writer

I am not too sure how to resolve this, but weirdly, when I ran this on the non-liftover VCF (gnomad.exomes.r2.1.1.sites.vcf.bgz), it actually worked fine. Will appreciate any input on this matter!

danking · July 17, 2023, 2:26pm

Hey, @MsUTR , I’m sorry you’re running into this issue. We’ll look into it, but this kind of problem is hard to fix. The root cause is that this VCF file has a very large number of fields. Hail’s parser generates code so that it can parse these kinds of VCFs really fast but, due to JVM limitations, that code can grow too large.

This isn’t particularly high priority for us because gnomAD has publicly released Hail Tables for these VCFs. Using a Hail Table avoids the parsing problem and saves you the cost of importing. Can you use one of these tables instead?

gs://gcp-public-data–gnomad/release/2.1.1/liftover_grch38/ht/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.ht
s3://gnomad-public-us-east-1/release/2.1.1/liftover_grch38/ht/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.ht
https://datasetgnomad.blob.core.windows.net/dataset/release/2.1.1/liftover_grch38/ht/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.ht

danking · September 1, 2023, 9:35pm

Hmm.

I was unable to reproduce this, so I closed the GitHub issue I had created for the issue. [query] gnomAD 2.1.1 sites table VCF is not parseable with Hail · Issue #13249 · hail-is/hail · GitHub .

Topic		Replies	Views
ClassTooLargeException merging many wide vcfs Hail Query & hailctl	2	430	August 31, 2021
ClassTooLargeException Hail Query & hailctl	3	457	September 29, 2020
FatalError: IllegalArgumentException: requirement failed (Error occured during import_vcf) Hail Query & hailctl	6	728	April 11, 2022
Error summary: OutOfMemoryError: Java heap space Hail Query & hailctl	15	2584	August 18, 2022
Unable to create matrix table of gnomAD chr1, chr2 Hail Query & hailctl	2	296	January 3, 2023

Error summary: ClassTooLargeException: Class too large

Related topics