@tpoterba, How can i calculate MAF manually?
Waiting for response mates
EDIT: I deleted my old answer, I see now what variant qc returns.
You could just get the MAF
from variant qc directly by accessing the fields of AF
returned by variant_qc
Are you saying manually to mean “without calling variant qc”?
Yes i want them without variant_qc or info.AF, i wanted to calculate it and annotate back into the data.
Disclaimer: I am not a geneticist.
You could do something like:
mt.annotate_rows(MAF = (2*hl.agg.count_where(mt.GT.is_hom_var()) + hl.agg.count_where(mt.GT.is_het()))/(2*hl.agg.count()))
I’m pretty sure that’s right. If I understand right, MAF is computed by taking 2 * number of hom var people and adding that to the number of hets, then dividing that by total number of alleles (which is 2 times the total number of samples).
You could also use the call stats aggregator directly, which is easier and definitely going to be right: https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.call_stats
I don’t know if call_stats
aggregator meets you definition of computing manually though, as I’m not sure what you’re trying to do.
I am facing some errors in MAFs so trying to annotate manually and making sense of it. Thanks for the help i will try and get back to you tomorrow.
The most manual way I can think of is by first calculating the alternative allele frequency (AAF)
# Alternative allele frequency is the sum of alternative alleles
# devided by the total number of alleles (i.e. 2 times the number
# of individuals with non-missing genotype for that site).
mt = mt.annotate_rows(
AAF = hl.agg.sum(mt.GT.n_alt_alleles()) / (2 * hl.agg.count_where(hl.is_defined(mt.GT.n_alt_alleles())))
)
and then find the minor allele frequency as the AAF if below 0.5 or 1-AAF if above 0.5.
# Now your minor allele frequency will either be AAF or 1-AAF.
mt = mt.annotate_rows(
MAF = hl.cond(mt.AAF < 0.5, mt.AAF, 1 - mt.AAF)
)
Thank You, Can you tell me if this is the exact way that variqnt_qc calculate MAF?
This is how variant_qc computes allele frequencies.