Can I liftover a hail table from GRCH37 to GRCH38

Wei_Zhang · February 12, 2019, 10:24pm

Dear Hail team:

I’m wondering if it is possible to liftover a table from GRCH37 to GRCH38? I realized there is a function to liftover a locus, but can I liftover an entire hail table? Thanks!

Best regards,
Wei

tpoterba · February 13, 2019, 6:25pm

Konrad is planning to add a bit about this in the how-to, but here’s his example:

Lift over the locus coordinates in a Table or MatrixTable from reference
genome 'GRCh37' to 'GRCh38':


    >>> rg37 = hl.get_reference('GRCh37')  # doctest: +SKIP
    >>> rg38 = hl.get_reference('GRCh38')  # doctest: +SKIP
    >>> rg37.add_liftover('gs://hail-common/references/grch37_to_grch38.over.chain.gz', rg38)  # doctest: +SKIP
    >>> ht = ht.annotate(new_locus=hl.liftover(ht.locus, 'GRCh38'), old_locus=ht.locus)  # doctest: +SKIP
    >>> ht = ht.filter(hl.is_defined(ht.new_locus))  # doctest: +SKIP
    >>> ht = ht.key_by(locus=ht.new_locus)  # doctest: +SKIP

Wei_Zhang · February 13, 2019, 8:02pm

Hi Tim,

Thank you very much! That solves the problem.

Best regards,
Wei

Wei_Zhang · February 14, 2019, 10:05pm

Hi Tim,

Can I follow up with a related question? I successfully lifted over an hail table using your code. Now I want to annotate a MatrixTable using this Table with the following command:

mt = mt.annotate_rows(gnomAD = gn[mt.locus, mt.alleles])

But I got the following error message. My suspicion is that there are some variants in MatrixTable that are not included in the Table, which cause an error when calling “gn[mt.locus, mt.alleles]”. Could you please let me know how to fix this issue?

Hail version: 0.2.8-70304a52d33d
Error summary: SparkException: Job aborted due to stage failure: Task 340 in stage 49.0 failed 4 times, most recent failure: Lost task 340.3 in stage 49.0 (TID 174347, ip-10-66-50-98.goldfinch.lan, executor 62): ExecutorLostFailure (executor 62 exited caused by one of the running tasks) Reason: Container marked as failed: container_1547607392313_0230_01_173099 on host: ip-10-66-50-98.goldfinch.lan. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

Thank you very much!
Wei

tpoterba · February 14, 2019, 10:36pm

this error (137) is usually a Spark out-of-memory exception. The code you’re running seems perfectly fine.

What kind of cluster are you running on?

Wei_Zhang · February 15, 2019, 1:55pm

Thanks for your reply!
I’m using a 24-node cluster from AWS, with 192 VCores and 576 Gb memory.

tpoterba · February 15, 2019, 1:58pm

what is the full pipeline you are running?

Wei_Zhang · February 15, 2019, 4:12pm

Load data

mt = hl.read_matrix_table(‘all.SNVs.vqsr.mt’)

gnomAD annots

gn = hl.read_table(‘gnomad.genomes.r2.1.sites.ht’)

rg37 = hl.get_reference(‘GRCh37’)
rg38 = hl.get_reference(‘GRCh38’)
rg37.add_liftover(‘grch37_to_grch38.over.chain.gz’, rg38)
gn = gn.annotate(new_locus=hl.liftover(gn.locus, ‘GRCh38’), old_locus=gn.locus)
gn = gn.filter(hl.is_defined(gn.new_locus))
gn = gn.key_by(locus=gn.new_locus, alleles=gn.alleles)

mt = mt.annotate_rows(gnomAD = gn[mt.locus, mt.alleles])
mt.rows().show()

danking · February 16, 2019, 3:22pm

How many partitions in the MT and the table? Generally, you can fix memory issues by increasing the number of partitions. I’d try doubling the number of partitions on both.

Wei_Zhang · February 17, 2019, 3:12pm

I have 67376 partitions for the MT and 10000 partitions for the table. Thanks for the suggestion. I will try to increase these numbers.

tpoterba · April 26, 2019, 10:02am

A post was split to a new topic: Liftover NullPointerException

kumarveerapen · May 10, 2022, 9:03pm

Hi, Hail Team! As you all know, I think Hail is fantastic and I am planning on using it for a dataset that is in hg19 that I’d like to liftover to hg38.
One of my questions is: what is the difference between the Hail liftover function vs the one from Picard tools? Any thoughts?

danking · May 11, 2022, 3:33pm

We just call samtools directly to do the liftover. I don’t know what else Picard does, but Hail does not address the alleles, we only deal with the locus itself.

Topic		Replies	Views
Liftover from 38 to 37 Hail Query & hailctl	6	535	January 25, 2023
Liftover Range Exception and Support for chrom 'MT' Hail Query & hailctl	3	629	September 9, 2019
Liftover NullPointerException Hail Query & hailctl	6	586	April 26, 2019
Error running Liftover of UK Biobank Hail Query & hailctl	7	1203	June 25, 2020
Potential liftover issue: Error summary: NoSuchElementException: key not found: GRCh37 Hail Query & hailctl	4	1336	December 6, 2021

Can I liftover a hail table from GRCH37 to GRCH38

Load data

gnomAD annots

Related topics