Hi all, I’m relatively new to Hail and am having some difficulty filtering a MatrixTable by rows. I’ve read in a vcf and done some basic QC, giving me a MatrixTable I’ve named mts. I have 3 samples, and I need to remove all rows where GT is not 0/0 for all samples. When I run mts.GT.show() I can clearly see the values for each sample.
The MatrixTable interface intentionally doesn’t let you query single entries by column value, but instead forces you to write your pipeline in terms of computations applied all entry values as aggregations.
Here are two ways to remove sites that are 0/0 at every sample
Is there a way to do the opposite, so remove all sites that are NOT 0/0? I’ve tried using is_het_ref() and mt.GT.is_hom_ref() == False, but both seem to give me the same output as mts = mts.filter_rows(hl.agg.all(mt.GT.is_hom_ref()), keep=False)
Update: f = mts.filter_rows(mts.variant_qc.AC[0] == 6) seems to work
which in principle seems to work, as mtf.count() gives (4, 1680) and mtf2.count() (4, 517). However, I have still genotypes left which do have a combination of 0/0 and NA. So I assume having NA as genotype is a problem. Is there a way to also remove those ?