Hi all,
I am working with the WES data set of the UKB in Hail, on a gene located on the X chromosome.
Before quality control, I have 2.5k sites all in a non-pseudoautosomal region (checked with mt_x_m.locus.in_x_nonpar()).
Calls from deepvariant are diploid:
print(mt_x_m_par.aggregate_entries(hl.agg.counter(mt_x_m_par.GT.ploidy)))
{2: 335236742, None: 4499900}
But I still get quite a few heterozygotes:
print(mt_x_m.aggregate_entries(hl.agg.counter(mt_x_m.GT.is_het())))
{False: 335138969, True: 97773, None: 4499900}
print(mt_x_m.aggregate_entries(hl.agg.counter(mt_x_m.GT.is_hom_var())))
{False: 335124731, True: 112011, None: 4499900}
What might explain these het calls (which should not happen)? Would it be best to remove these GT calls?
Thanks in advance,