Calculating MAF Manually

@tpoterba, How can i calculate MAF manually?

Waiting for response mates

EDIT: I deleted my old answer, I see now what variant qc returns.

You could just get the MAF from variant qc directly by accessing the fields of AF returned by variant_qc

Are you saying manually to mean “without calling variant qc”?

Yes i want them without variant_qc or info.AF, i wanted to calculate it and annotate back into the data.

Disclaimer: I am not a geneticist.

You could do something like:

mt.annotate_rows(MAF = (2*hl.agg.count_where(mt.GT.is_hom_var()) +  hl.agg.count_where(mt.GT.is_het()))/(2*hl.agg.count()))

I’m pretty sure that’s right. If I understand right, MAF is computed by taking 2 * number of hom var people and adding that to the number of hets, then dividing that by total number of alleles (which is 2 times the total number of samples).

You could also use the call stats aggregator directly, which is easier and definitely going to be right: https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.call_stats

I don’t know if call_stats aggregator meets you definition of computing manually though, as I’m not sure what you’re trying to do.

I am facing some errors in MAFs so trying to annotate manually and making sense of it. Thanks for the help i will try and get back to you tomorrow.

The most manual way I can think of is by first calculating the alternative allele frequency (AAF)

# Alternative allele frequency is the sum of alternative alleles
# devided by the total number of alleles (i.e. 2 times the number
# of individuals with non-missing genotype for that site).
mt = mt.annotate_rows(
	AAF = hl.agg.sum(mt.GT.n_alt_alleles()) / (2 * hl.agg.count_where(hl.is_defined(mt.GT.n_alt_alleles())))
)

and then find the minor allele frequency as the AAF if below 0.5 or 1-AAF if above 0.5.

# Now your minor allele frequency will either be AAF or 1-AAF.
mt = mt.annotate_rows(
	MAF = hl.cond(mt.AAF < 0.5, mt.AAF, 1 - mt.AAF)
)

Thank You, Can you tell me if this is the exact way that variqnt_qc calculate MAF?

This is how variant_qc computes allele frequencies.