Filter_rows by list of genes

When I filter like this:

filter_condition = ((mt.info.SYMBOL == ‘GENE1’)|(mt.info.SYMBOL == ‘GENE2’)|(mt.info.SYMBOL == ‘GENE3’))
mt.filter_rows(filter_condition).count()

it works well. How can I filter not by specifying each gene individually, but use a list of genes, e.g. by importing it as a table.

Great question, try this:

mt = mt.filter_rows(hl.literal(['GENE1', 'GENE2',...]).contains(mt.info.SYMBOL))

If you have a table you could do this:

genes = hl.import_table(...).gene_name.collect()
mt = mt.filter_rows(hl.literal(genes).contains(mt.info.SYMBOL))

If your table was really big (e.g. it contained loci), there are better ways to do this, but for small sets (like thousands of genes), this is usually better.

1 Like

Thanks - that works ! I was trying with hl.eval_typed instead of hl.literal.