Hello!
I’m building a Hail pipeline to calculate allele frequency statistics by sex and ancestry, aiming to replicate the gnomAD-style table with:
-
Allele Count (AC)
-
Allele Number (AN)
-
Number of Homozygotes (
n_hom_var
) -
Number of Hemizygotes (when applicable)
-
Allele Frequency (AF)
Using hl.agg.call_stats
, I get AC, AN, AF, and n_hom_var
directly, but Hail doesn’t seem to provide a built-in “hemizygous count” column for sex chromosome variants.
My understanding of the logic is:
-
PAR regions: diploid →
n_hom_var
works as usual. -
Non-PAR regions: haploid → we need to count any alternate allele in these samples as “hemizygous alternate.”
Does this approach make sense, or is there a more optimal way in Hail to compute hemizygous counts that I’m missing?
Thanks in advance!