Generate variant list in raw string format


#1

I’m trying to generate the list of all variants in VDS in their String representation. I tried the following which works but I’m wondering if there is a more efficient way to do it:

list=map(lambda row : str(row.v), vds.variants_table().select(‘v’).collect())

Thanks!


#2

One unfortunate consequence of the py4j package we use to communicate between java and python is terrible performance – converting any java object to python takes time roughly proportional to the number of java objects converted with very bad constants.

So –

this:

list=map(lambda row : str(row.v), vds.variants_table().select(‘v’).collect())

Is going to be a lot slower than this:

list=[row.v for row in vds.variants_table().select(‘v’).annotate('v = str(v)').collect())]

Which will in turn be even slower than:

list = vds.query_variants('variants.map(v => str(v)).collect()')

Which will still be slower than this horrible piece of code:

list = vds.query_variants('variants.map(v => str(v)).collect().mkString(",")').split(',')

#3

First a big thanks to your amazing speed of response!

It’s great to learn those difference and I’ll definitely take the best “verbose” solution :slight_smile:

Admittedly I’m still learning my ropes of using HQL.


#4

This stuff isn’t easy to learn – the engineers on our team are really the only ones who know these tricks.

This is one particular bottleneck that we have ideas how to fix, but those fixes aren’t compatible with our infrastructure in 0.1.