The default variantqc functionality doesn’t take the sex of the samples into account, and I found this meant that some suboptimal things were happening, like I was losing everything on the Y when I filtered variants by call rate (since most females didn’t have calls) and the MAC wasn’t accurate in hemizygous regions.
So, since I had a reported sex for my samples (and imputed sex that matched it, with few exceptions), I wrote a few of my own variant calculations:
annotateglobal expr -c ’
global.Num_males = samples.filter(s => sa.pheno.Gender == “Male”).count(),
global.Num_females = samples.filter(s => sa.pheno.Gender == “Female”).count()’
annotatevariants expr -c ’
va.Male_hets = gs.filter(g => sa.pheno.Gender == “Male” && g.isHet).count(),
va.Male_homvars = gs.filter(g => sa.pheno.Gender == “Male” && g.isHomVar).count(),
va.Male_calls = gs.filter(g => sa.pheno.Gender == “Male” && g.isCalled).count(),
va.Female_hets = gs.filter(g => sa.pheno.Gender == “Female” && g.isHet).count(),
va.Female_homvars = gs.filter(g => sa.pheno.Gender == “Female” && g.isHomVar).count(),
va.Female_calls = gs.filter(g => sa.pheno.Gender == “Female” && g.isCalled).count()’
annotatevariants expr -c ’
va.callRate = if (v.inYNonPar) ( va.Male_calls / global.Num_males ) else if (v.inXNonPar) ( (va.Male_calls + 2va.Female_calls) / (global.Num_males + 2global.Num_females) ) else ( (va.Male_calls + va.Female_calls) / (global.Num_males + global.Num_females) ),
va.MAC = if (v.inYNonPar) ( va.Male_homvars ) else if (v.inXNonPar) ( va.Male_homvars + va.Female_hets + 2va.Female_homvars ) else ( va.Male_hets + 2va.Male_homvars + va.Female_hets + 2va.Female_homvars ),
va.MAF = if (v.inYNonPar) ( va.Male_homvars / va.Male_calls) else if (v.inXNonPar) ( (va.Male_homvars + va.Female_hets + 2va.Female_homvars) / (va.Male_calls + 2va.Female_calls) ) else ( (va.Male_hets + 2va.Male_homvars + va.Female_hets + 2va.Female_homvars) / (2va.Male_calls + 2*va.Female_calls) )’ \