Creating gnomadFreq.tsv file

I’m trying to create the gnomadFreq.tsv file for the priors needed to run the de_novo function on a vcf file that has trio calls.
What is a way to create this file and what format does it have?

priors = hl.import_vcf('data/gnomadFreq.tsv', impute=True)
priors = priors.transmute(**hl.parse_variant(priors.Variant)).key_by('locus', 'alleles')

Thank you!

That’s just an example file that we use. You don’t need that particular file. If you want to see what’s in that particular file, it’s on our github here: https://github.com/hail-is/hail/blob/master/hail/python/hail/docs/data/gnomadFreq.tsv

But how do you generate a file like that to use the de_novo method?

the thing you need is not a file, but a variant annotation that is the population frequency. Do you have something like that? Using the gnomad frequencies from gnomad.broadinstitute.org may be a good idea.

I see, how can I annotate the vcf with the population allele frequency from gnomad using hail? Sorry if this is a basic question.

How are you running Hail?

You can also get results immediately by using in-sample frequency as a baseline:

mt = hl.split_multi_hts(mt)
mt = hl.variant_qc(mt)
pedigree = hl.Pedigree.read('data/trios.fam')
results = hl.de_novo(dataset, pedigree, mt.variant_qc.AF[1])

Thank you for the help.
I’m running Hail after GATK best practices, and using just one trio so I guess I’d have to annotate with gnomad, but can’t find a way to do that from the tutorials.

Ah! I see.

In that case, this algorithm may not be the right one – it’s designed to work on cohorts, where there’s information from looking at frequencies across all your samples.

If think if you just set that parameter to 0 you’ll get mostly the results you expect, though the HIGH / MEDIUM / LOW confidence in calls should be taken with a grain of salt.