Conditional annotate_entries

Hi,

I have been using the following code to do some basic filtering:

ab = mt.AD[1] / hl.sum(mt.AD)
filter_condition_ab = (((mt.GT.is_hom_ref() & (ab < 0.1)) |(mt.GT.is_het() & (ab > 0.1) & (ab < 0.9)) | (mt.GT.is_hom_var() & (ab > 0.9))) & (mt.GQ>=30)& (mt.DP>=10))
mt = mt.filter_entries(filter_condition_ab)

Instead of filtering, I would like to try and add an annotation alongside the GT field

e.g.

mt = mt.annotate_entries(GTF = hl.null(hl.tcall) )

How would I do this conditional on the filter as above? i.e. where the entry would be filtered it would instead receive a GTF of ./.

Thanks a lot

Dan

If I understand correctly, you want to annotate entries that meet the filter condition a certain way, and annotate entries that don’t meet the condition another way. To do that, you should use hl.if_else

mt = mt.annotate_entries(new_field = hl.if_else(filter_condition_ab, ???, ???))

The left set of question marks can be replaced with the annotation you want when the condition is true, and the right set of question marks can be replaced with the annotation you want when the condition is false.

EDIT: Originally, I had forgotten the new_field = in my answer

Hi John,

Thanks for the response,

Where would I put the new field in this? say I wanted to add GTF=

where filter condition is true I want to make GTF=hl.null(hl.tcall) and where it is false GTF=mt.GT

mt = mt.annotate_entries(hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT))

Ah, my mistake, I left that out. My original answer should have been:

mt = mt.annotate_entries(new_field = hl.if_else(filter_condition_ab, ???, ???))

So what you’re looking for is:

mt = mt.annotate_entries(GTF = hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT)

I’m going to edit my original answer for future readers

HI,

I gave that a go and I get the following error:

Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in if_else
File “/home/cancgene/dchubb/.conda/envs/hail/lib/python3.6/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/cancgene/dchubb/.conda/envs/hail/lib/python3.6/site-packages/hail/expr/functions.py”, line 333, in if_else
indices, aggregations = unify_all(condition, consequent, alternate)
File “/home/cancgene/dchubb/.conda/envs/hail/lib/python3.6/site-packages/hail/expr/expressions/base_expression.py”, line 345, in unify_all
)) from None
hail.expr.expressions.base_expression.ExpressionException: Cannot combine expressions from different source objects.
Found fields from 2 objects:
<hail.matrixtable.MatrixTable object at 0x2ba7fc22d438>: [‘GT’, ‘AD’, ‘GQ’, ‘DP’]
<hail.matrixtable.MatrixTable object at 0x2ba7fccc0550>: [‘GT’]

Any ideas?

Thanks a lot

Dan

can you paste your code? You have an annotate that references a field on a (probably earlier) MT object.

Thanks Tim, my mistake.

mt = mt.annotate_entries(GTF = hl.if_else(filter_condition_ab, mt.GT,hl.null(hl.tcall))) now does what I want.

In general is there a strategy for ‘logically’ filtering, rather than a destructive filter?

At the moment, I would apply variant and sample filters which would then remove the entries/variants/samples from the MT. Ideally I would like to flag these elements so that they could be ignored by different aggregators but would still be present if I wanted to view the data or change my filters later on? I could keep the original copy but that feels like a messy alternative.

Thanks a lot for the help.

Dan

I think the way the gnomAD team does “logical” filters is by annotating entries to add some kind of pas flag: annotate_entries(passes_filters = ...)

Then you can aggregate using hl.agg.filter(mt.passes_filters, ...).

If you want to use preboxed functions like variant_qc and sample_qc on the filtered data, then you can alternatively do something like:

mt = mt.annotate_entries(raw_GT = mt.GT, GT = hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT))

Does that answer the question?

Thanks Tim,

It looks like annotate_entries(passes_filters = …) is similar to what I was doing, using a blank GT as the filtered state. I’ll try wrapping in the hl.agg.filter a go, thanks.

I would very much like to use preboxed functions like the variant_QC, can I just make sure I understand what you have there?

mt = mt.annotate_entries(raw_GT = mt.GT, GT = hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT))

First off, I didn’t realize you could define >1 annotation at once, that might come in handy.

So in this case I would have an example of the raw_GT. Once I apply variant and sample QC on this, would it ignore all the blanked genotypes or would the Depth and other metrics still be calculated and would also need to be altered?

A similar thing that I am interested in doing in run variant and sample QC seperately on cases and controls. For example, I would want to remove variants with bad call rates in cases or controls as sometimes one set might have a particular bias.

A quick search has shown me that this has been asked before : Issues with sample and variant QC by group

Actually, I’ll put my question on that thread to keep things separate.

Thanks again,

Dan