This is a bit more complicated because it’s not just a structure, it’s an Array[Struct{...}]
.
If you want to just select the one canonical transcript per variant, then here’s a bit of discourse on that:
Parsing VEP output
If you wanted to remove variants where lof
was None for every transcript, then here’s the code to do that:
print(vds.filter_intervals(Interval.parse('22'))
.filter_variants_expr('va.vep.transcript_consequences.forall(tc => isMissing(tc.lof))')
.count_variants())
You might want to see the distribution of values for all transcripts:
print(vds.filter_intervals(Interval.parse('22'))
.query_variants('variants.flatMap(v => va.vep.transcript_consequences.map(tc => tc.lof)).counter()'))