Conditional annotate_entries

danchubb · May 13, 2020, 4:32pm

Hi,

I have been using the following code to do some basic filtering:

ab = mt.AD[1] / hl.sum(mt.AD)
filter_condition_ab = (((mt.GT.is_hom_ref() & (ab < 0.1)) |(mt.GT.is_het() & (ab > 0.1) & (ab < 0.9)) | (mt.GT.is_hom_var() & (ab > 0.9))) & (mt.GQ>=30)& (mt.DP>=10))
mt = mt.filter_entries(filter_condition_ab)

Instead of filtering, I would like to try and add an annotation alongside the GT field

e.g.

mt = mt.annotate_entries(GTF = hl.null(hl.tcall) )

How would I do this conditional on the filter as above? i.e. where the entry would be filtered it would instead receive a GTF of ./.

Thanks a lot

Dan

johnc1231 · May 13, 2020, 4:37pm

If I understand correctly, you want to annotate entries that meet the filter condition a certain way, and annotate entries that don’t meet the condition another way. To do that, you should use hl.if_else

mt = mt.annotate_entries(new_field = hl.if_else(filter_condition_ab, ???, ???))

The left set of question marks can be replaced with the annotation you want when the condition is true, and the right set of question marks can be replaced with the annotation you want when the condition is false.

EDIT: Originally, I had forgotten the new_field = in my answer

danchubb · May 13, 2020, 4:47pm

Hi John,

Thanks for the response,

Where would I put the new field in this? say I wanted to add GTF=

where filter condition is true I want to make GTF=hl.null(hl.tcall) and where it is false GTF=mt.GT

mt = mt.annotate_entries(hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT))

johnc1231 · May 13, 2020, 4:59pm

Ah, my mistake, I left that out. My original answer should have been:

mt = mt.annotate_entries(new_field = hl.if_else(filter_condition_ab, ???, ???))

So what you’re looking for is:

mt = mt.annotate_entries(GTF = hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT)

I’m going to edit my original answer for future readers

danchubb · May 13, 2020, 5:07pm

HI,

I gave that a go and I get the following error:

Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in if_else
File “/home/cancgene/dchubb/.conda/envs/hail/lib/python3.6/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/cancgene/dchubb/.conda/envs/hail/lib/python3.6/site-packages/hail/expr/functions.py”, line 333, in if_else
indices, aggregations = unify_all(condition, consequent, alternate)
File “/home/cancgene/dchubb/.conda/envs/hail/lib/python3.6/site-packages/hail/expr/expressions/base_expression.py”, line 345, in unify_all
)) from None
hail.expr.expressions.base_expression.ExpressionException: Cannot combine expressions from different source objects.
Found fields from 2 objects:
<hail.matrixtable.MatrixTable object at 0x2ba7fc22d438>: [‘GT’, ‘AD’, ‘GQ’, ‘DP’]
<hail.matrixtable.MatrixTable object at 0x2ba7fccc0550>: [‘GT’]

Any ideas?

Thanks a lot

Dan

tpoterba · May 13, 2020, 5:35pm

can you paste your code? You have an annotate that references a field on a (probably earlier) MT object.

danchubb · May 14, 2020, 7:38am

Thanks Tim, my mistake.

mt = mt.annotate_entries(GTF = hl.if_else(filter_condition_ab, mt.GT,hl.null(hl.tcall))) now does what I want.

In general is there a strategy for ‘logically’ filtering, rather than a destructive filter?

At the moment, I would apply variant and sample filters which would then remove the entries/variants/samples from the MT. Ideally I would like to flag these elements so that they could be ignored by different aggregators but would still be present if I wanted to view the data or change my filters later on? I could keep the original copy but that feels like a messy alternative.

Thanks a lot for the help.

Dan

tpoterba · May 14, 2020, 11:15am

I think the way the gnomAD team does “logical” filters is by annotating entries to add some kind of pas flag: annotate_entries(passes_filters = ...)

Then you can aggregate using hl.agg.filter(mt.passes_filters, ...).

If you want to use preboxed functions like variant_qc and sample_qc on the filtered data, then you can alternatively do something like:

mt = mt.annotate_entries(raw_GT = mt.GT, GT = hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT))

Does that answer the question?

danchubb · May 14, 2020, 4:10pm

Thanks Tim,

It looks like annotate_entries(passes_filters = …) is similar to what I was doing, using a blank GT as the filtered state. I’ll try wrapping in the hl.agg.filter a go, thanks.

I would very much like to use preboxed functions like the variant_QC, can I just make sure I understand what you have there?

mt = mt.annotate_entries(raw_GT = mt.GT, GT = hl.if_else(filter_condition_ab, hl.null(hl.tcall), mt.GT))

First off, I didn’t realize you could define >1 annotation at once, that might come in handy.

So in this case I would have an example of the raw_GT. Once I apply variant and sample QC on this, would it ignore all the blanked genotypes or would the Depth and other metrics still be calculated and would also need to be altered?

A similar thing that I am interested in doing in run variant and sample QC seperately on cases and controls. For example, I would want to remove variants with bad call rates in cases or controls as sometimes one set might have a particular bias.

A quick search has shown me that this has been asked before : Issues with sample and variant QC by group

Actually, I’ll put my question on that thread to keep things separate.

Thanks again,

Dan

Topic		Replies	Views
How to set only genotype to ./. based on different conditions Hail Query & hailctl	1	468	February 12, 2019
Modifying Specific Variant GT Hail Query & hailctl	4	237	December 8, 2023
Assign hom ref genotype to missing values Hail Query & hailctl	2	369	August 17, 2021
Entry filtering semantics question: allele read balance Hail Query & hailctl	12	923	October 31, 2019
Annotate row with sample information based on entry criteria Hail Query & hailctl	2	444	November 13, 2022

Conditional annotate_entries

Related topics