I’m trying to create the gnomadFreq.tsv file for the priors needed to run the de_novo function on a vcf file that has trio calls.
What is a way to create this file and what format does it have?
priors = hl.import_vcf('data/gnomadFreq.tsv', impute=True)
priors = priors.transmute(**hl.parse_variant(priors.Variant)).key_by('locus', 'alleles')
That’s just an example file that we use. You don’t need that particular file. If you want to see what’s in that particular file, it’s on our github here: https://github.com/hail-is/hail/blob/master/hail/python/hail/docs/data/gnomadFreq.tsv
But how do you generate a file like that to use the de_novo method?
the thing you need is not a file, but a variant annotation that is the population frequency. Do you have something like that? Using the gnomad frequencies from gnomad.broadinstitute.org may be a good idea.
I see, how can I annotate the vcf with the population allele frequency from gnomad using hail? Sorry if this is a basic question.
How are you running Hail?
You can also get results immediately by using in-sample frequency as a baseline:
mt = hl.split_multi_hts(mt)
mt = hl.variant_qc(mt)
pedigree = hl.Pedigree.read('data/trios.fam')
results = hl.de_novo(dataset, pedigree, mt.variant_qc.AF)
Thank you for the help.
I’m running Hail after GATK best practices, and using just one trio so I guess I’d have to annotate with gnomad, but can’t find a way to do that from the tutorials.
Ah! I see.
In that case, this algorithm may not be the right one – it’s designed to work on cohorts, where there’s information from looking at frequencies across all your samples.
If think if you just set that parameter to 0 you’ll get mostly the results you expect, though the HIGH / MEDIUM / LOW confidence in calls should be taken with a grain of salt.