I am having different results using two different approaches for Allele read balance
Approach 1:
ab = mt_sqc2.AD[1] / hl.sum(mt_sqc2.AD)
filter_condition_ab = ((mt_sqc2.GT.is_hom_ref() & (ab <= 0.1)) |
(mt_sqc2.GT.is_het() & (ab >= 0.25) & (ab <= 0.75)) |
(mt_sqc2.GT.is_hom_var() & (ab >= 0.9)))
fraction_filtered = mt_sqc2.aggregate_entries(hl.agg.fraction(~filter_condition_ab))
print(f'Filtering {fraction_filtered * 100:.2f}% entries out of downstream analysis.')
mt_sqc2 = mt_sqc2.filter_entries(filter_condition_ab)
Result - Filtering 0.36% entries out of downstream analysis.
Approach 2:
ab = mt_sqc2.AD[1] / hl.sum(mt_sqc2.AD)
filter_condition_ab = (
hl.case(missing_false=True)
.when(mt_sqc2.GT.is_hom_ref(), ab <= 0.1)
.when(mt_sqc2.GT.is_het(), (ab >= 0.25) & (ab <= 0.75))
.when(mt_sqc2.GT.is_hom_var(), ab >= 0.9)
.default(True) # keep everything else
)
fraction_filtered = mt_sqc2.aggregate_entries(hl.agg.fraction(~filter_condition_ab))
print(f'Filtering {fraction_filtered * 100:.2f}% entries out of downstream analysis.')
mt_sqc2 = mt_sqc2.filter_entries(filter_condition_ab)
Result - Filtering 0.26% entries out of downstream analysis.
Note that 0.58% missing data is already present in the original MatrixTable.