I have a MT that looks something like
+---------------+---------------+----------------------+
| locus | alleles | info.CLNDISDB |
+---------------+---------------+----------------------+
| locus<GRCh37> | array<str> | array<str> |
+---------------+---------------+----------------------+
| 2:47630512 | ["A","AG"] | ["A", "A|B"] |
| 2:47690234 | ["T","TAATG"] | ["B"] |
| 2:47693860 | ["T","TA"] | ["A"] |
| 2:47705430 | ["TTAA","T"] | ["B|C", "C|D"] |
| 2:48026310 | ["C","CTA"] | ["B", "A|C"] |
+---------------+---------------+----------------------+
I want to filter all the rows that contain the string 'B'
. My current query uses the .contains()
function as follows:
mt.filter_rows(
(~hl.is_missing(mt.info['CLNDISDB'])) &
(mt.info['CLNDISDB'].contains('B'))
)
But I only get rows 2 and 5 when I want to get all 4 rows that matches 'B'
somehwere in the info (i.e. rows 1, 2, 4 and 5). Is there a way to match within the .contains()
function? Say a substring or regex match into a string array expression?