I have some NA values in mt.GT and I need to filter out those rows. I’ve tried using isNA but that gives me type errors. Is there a way to do this?
What TypeError are you getting?
TypeError: init: parameter ‘value’: expected hail.ir.base_ir.IR, found hail.expr.expressions.typed_expressions.CallExpression:
I tried using: ir.IsNA(sibs.GT) == False
Ah yeah, you should be using hl.is_missing
. You’re never going to want to use any of the ir
methods, those are all internal things.
hl.is_missing doesn’t seem to work
Can you share the code you’re trying?
I’m trying the following:
mt = mt.annotate_rows(is_NA = hl.agg.count_where(hl.is_missing(mt.GT)))
And what seems to be the problem?
All of the values in the is_NA row are 0, but when I do mt.GT.show() I can clearly see some GT values of NA.
If you give me something to reproduce with I can look at it. If I grab 1000 genomes data and try what you wrote it works just fine.
How do you know all the values in is_NA
are 0? I’d filter to one variant where you know you have missing GTs and try to see if you can get it for just that row.
@spencer, I suspect you have filtered entries, check the docs for filter_entries
and compute_entry_filter_stats
. Entry filtering is useful for saying “these genotypes should be considered excluded from my dataset” whereas missing is useful for saying “these genotypes have an unknown/unascertained value.”