Filtering out samples using hl.is_nan

hhx037 · March 1, 2019, 7:36pm

I want to remove samples with missing phenotypes from the matrix, and to do so I use mt = mt.filter_cols(~hl.is_nan(mt.pheno))
It appears to indeed exclude samples, but if I test the opposite, keeping the samples with a missing phenotype, mt = mt.filter_cols(hl.is_nan(mt.pheno)), I get 0 when I count the columns, so I’m worried it’s not working the way I think it is and that the exclusion I see does not truly correspond to what I intend to do.

Could anyone explain it to me please?

tpoterba · March 2, 2019, 6:44pm

nan and missing are totally different. mt = mt.filter_cols(~hl.is_nan(mt.pheno)) is actually doing what you want, though, by accident – when the filter condition evaluates to missing, the col is removed. Since hl.is_nan(mt.pheno) is missing when mt.pheno is missing, this removes missing phenotypes.

When you flip it and do mt.filter_cols(hl.is_nan(mt.pheno), then the missing phenotypes will still be missing, and the defined phenotypes aren’t nan (presumably) and return false. So everything is filtered.

hl.is_missing is the right function to use here

hhx037 · March 3, 2019, 9:34am

Oh, I see, makes sense now! Thank you

Topic		Replies	Views
Remove rows with "NA" in any sample Hail Query & hailctl	6	1068	December 3, 2021
Filter row fields if all rows are missing a value Hail Query & hailctl	0	138	December 11, 2023
Select certain samples from MatrixTable Hail Query & hailctl	9	825	October 6, 2022
Checking if a CallExpression is NA Hail Query & hailctl	10	495	November 16, 2020
How does Hail treat missing calls? Hail Query & hailctl	2	268	October 5, 2023

Filtering out samples using hl.is_nan

Related topics