Filtering out samples using hl.is_nan


#1

I want to remove samples with missing phenotypes from the matrix, and to do so I use mt = mt.filter_cols(~hl.is_nan(mt.pheno))
It appears to indeed exclude samples, but if I test the opposite, keeping the samples with a missing phenotype, mt = mt.filter_cols(hl.is_nan(mt.pheno)), I get 0 when I count the columns, so I’m worried it’s not working the way I think it is and that the exclusion I see does not truly correspond to what I intend to do.

Could anyone explain it to me please?


#2

nan and missing are totally different. mt = mt.filter_cols(~hl.is_nan(mt.pheno)) is actually doing what you want, though, by accident – when the filter condition evaluates to missing, the col is removed. Since hl.is_nan(mt.pheno) is missing when mt.pheno is missing, this removes missing phenotypes.

When you flip it and do mt.filter_cols(hl.is_nan(mt.pheno), then the missing phenotypes will still be missing, and the defined phenotypes aren’t nan (presumably) and return false. So everything is filtered.

hl.is_missing is the right function to use here


#3

Oh, I see, makes sense now! Thank you :slight_smile: