Hi,
I’m trying to find out all the values va.vep.transcript_consequences.lof takes in my variant dataset, but I can’t find how.
I think an expression such as (‘va.vep.transcript_consequences.map(t => t.lof).toSet()’) would work, but to what function should I feed it?
Thanks,
Stephane
1 Like
I think you want query_variants
here:
csq_set = print(vds.query_variants(
'''variants
.flatMap(v => va.vep.transcript_consequences.map(t => t.lof))
.collectAsSet()'''))
print(csq_set)
Alternatively, you can get out the counts of each unique value:
csqs = print(vds.query_variants(
'''variants
.flatMap(v => va.vep.transcript_consequences.map(t => t.lof))
.counter()'''))
from collections import Counter
print(Counter(csqs).most_common())
Yes, query_variants is what I was looking for, thank you.
I still don’t fully understand how the expressions work though… none of the following (getting the set of chromosomes) work:
vds.query_variants(‘v.map(v => v.contig).collectAsSet()’)
vds.query_variants(’’‘variants.flatMap(v => v.contig).collectAsSet()’’’)
What am I doing wrong?
The expression language is extremely confusing. Query variants exposes one top-level object, variants
, an Aggregable
. Aggregables are unordered distributed collections of things, like rows or columns of the VDS or its annotation tables.
The most confusing thing about them is that they carry an implicit “scope” around – extra variables you can access for free and can’t map away. In query_variants, the variants
aggregable is an Aggregable[Variant]
that has v
and va
in its scope.
Aggregables support ‘aggregator’ operations like count, collect, stats, counter, and more. These functions work on the elements in the aggregable, so usually you’ll need to change the elements with map
, filter
, and flatMap
. The difference between map
and flatMap
is that map
changes elements one-to-one, while flatMap
can change the number of elements in the Aggregable because the function supplied returns an array.
For the contigs, you’ll want to use map
, not flatMap
. if you swap that out in your second line, it’ll work!
The first is incorrect because v
is not a top-level variable in query_variants.
1 Like
Thank you, that is very helpful.