Changing samples annotation based on expression


#1

Hi,

I’ve been looking for the right way to change an annotation based on an expression, unsuccessfully.
There are plenty of examples about how to create new annotations, and how to filter, but nothing in tutorials on updating annotations.

Example:

test.query_samples(‘samples.map(s => sa.info.population).counter()’)
{u’YRI’: 167L, u’CHB’: 84L, u’ASW’: 83L, u’TSI’: 88L, u’MEX’: 77L, u’MKK’: 171L, None: 992L, u’LWK’: 90L, u’CEU’: 165L, u’CHD’: 85L, u’JPT’: 86L, u’GIH’: 88L}

I’m trying to add a population for the 992 samples for which it’s missing.

filtering based on this criteria is straightforward: subset = test.filter_samples_expr(‘sa.info.population.isMissing()’)
replacing the annotation of all samples in the subset is also straightforward: subset = subset.annotate_samples_expr(‘sa.info.population=“Malmo”’)

But how can I do something like the following expression directly?

if sa.info.population.isMissing() sa.info.population=“Cohort1” else do nothing

Thanks,

Steph


#2

You’ve almost got it – the key is that the “annotation expression” syntax looks like name = expr. In this case:

sa.info.population = if (sa.info.population.isMissing()) "cohort1" else sa.info.population

#3

I see… it makes perfect sense now. Thank you, it’s quite a lot to learn, but you’ve designed a fantastic tool, definitely worth it!

Cheers,

Steph


#4

We’re working on embedding the expression language entirely in Python (much like Pandas) for Hail 0.2, which will hopefully make it a lot more accessible. I sympathize with the steep learning curve right now!