I want to remove samples with missing phenotypes from the matrix, and to do so I use
mt = mt.filter_cols(~hl.is_nan(mt.pheno))
It appears to indeed exclude samples, but if I test the opposite, keeping the samples with a missing phenotype,
mt = mt.filter_cols(hl.is_nan(mt.pheno)), I get 0 when I count the columns, so I’m worried it’s not working the way I think it is and that the exclusion I see does not truly correspond to what I intend to do.
Could anyone explain it to me please?
nan and missing are totally different.
mt = mt.filter_cols(~hl.is_nan(mt.pheno)) is actually doing what you want, though, by accident – when the filter condition evaluates to missing, the col is removed. Since
hl.is_nan(mt.pheno) is missing when
mt.pheno is missing, this removes missing phenotypes.
When you flip it and do
mt.filter_cols(hl.is_nan(mt.pheno), then the missing phenotypes will still be missing, and the defined phenotypes aren’t nan (presumably) and return false. So everything is filtered.
hl.is_missing is the right function to use here
Oh, I see, makes sense now! Thank you