Filtering rows based on VEP

Hi all, I am trying to obtain a subset of my MatrixTable that has been annotated with VEP.
It is accessed by:
vep_most_severe_consequence = mt.most_severe_consequence
and it is a string expression
I tried:

vep_most_severe_consequence = mt.most_severe_consequence
mt.filter_rows(hl.is_defined(vep_most_severe_consequence == 'frameshift_variant'))

But that did not seem to work. Can I get advice on how to get this to work? Thank you so much

You don’t need hl.is_defined that checks if a value is missing or not. I think you just want:

mt = mt.filter_rows(mt.most_severe_consequence == 'frameshift_variant'))

Thanks! If I want to obtain a subset by multiple strings, will the or operator work?

Try:

mt = mt.filter_rows(hl.set(['frameshift_variant', 'stop_gained']).contains(mt.most_severe_consequence))

In general, for boolean conditions, in Hail, you must do: (xx == yy) | (zz == aa) the | is or and the parentheses are required due to precedence issues.

That works too! Just to check, would it be similar to:
mt.filter_rows(mt.most_severe_consequence == ‘frameshift_variant’ | mt.most_severe_consequence == ‘stop_gained’ ))

No, when using | you *must` wrap both sides in parentheses:

mt.filter_rows(
    (mt.most_severe_consequence == 'frameshift_variant') | (mt.most_severe_consequence == 'stop_gained')
)

1 Like