Creating gnomadFreq.tsv file

emauryg · January 31, 2020, 9:10pm

I’m trying to create the gnomadFreq.tsv file for the priors needed to run the de_novo function on a vcf file that has trio calls.
What is a way to create this file and what format does it have?

priors = hl.import_vcf('data/gnomadFreq.tsv', impute=True)
priors = priors.transmute(**hl.parse_variant(priors.Variant)).key_by('locus', 'alleles')

Thank you!

johnc1231 · January 31, 2020, 9:27pm

That’s just an example file that we use. You don’t need that particular file. If you want to see what’s in that particular file, it’s on our github here: https://github.com/hail-is/hail/blob/master/hail/python/hail/docs/data/gnomadFreq.tsv

emauryg · January 31, 2020, 9:43pm

But how do you generate a file like that to use the de_novo method?

tpoterba · January 31, 2020, 9:51pm

the thing you need is not a file, but a variant annotation that is the population frequency. Do you have something like that? Using the gnomad frequencies from gnomad.broadinstitute.org may be a good idea.

emauryg · February 1, 2020, 6:36pm

I see, how can I annotate the vcf with the population allele frequency from gnomad using hail? Sorry if this is a basic question.

tpoterba · February 3, 2020, 12:29pm

How are you running Hail?

You can also get results immediately by using in-sample frequency as a baseline:

mt = hl.split_multi_hts(mt)
mt = hl.variant_qc(mt)
pedigree = hl.Pedigree.read('data/trios.fam')
results = hl.de_novo(dataset, pedigree, mt.variant_qc.AF[1])

emauryg · February 3, 2020, 8:14pm

Thank you for the help.
I’m running Hail after GATK best practices, and using just one trio so I guess I’d have to annotate with gnomad, but can’t find a way to do that from the tutorials.

tpoterba · February 3, 2020, 8:18pm

Ah! I see.

In that case, this algorithm may not be the right one – it’s designed to work on cohorts, where there’s information from looking at frequencies across all your samples.

If think if you just set that parameter to 0 you’ll get mostly the results you expect, though the HIGH / MEDIUM / LOW confidence in calls should be taken with a grain of salt.

Topic		Replies	Views
Pop_frequency_prior format in hail.methods.de_novo Hail Query & hailctl	4	521	August 6, 2019
Extracting gnomad counts into a vcf file Hail Query & hailctl	7	1171	March 12, 2021
How to add annotations to a vcf file? Hail Query & hailctl	1	624	October 31, 2018
Export data after annotation Hail Query & hailctl	3	407	June 22, 2020
Gnomad allele frequency query Hail Query & hailctl	11	2765	March 31, 2021

Creating gnomadFreq.tsv file

Related topics