Variant annotation in MatrixTable

kousikbioinfo · June 26, 2020, 12:25am

Hi,
I have a table ( ht ), see ht.describe()

----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'chr': str
    'start': int32
    'end': int32
    'range': str
    'window_id': str
    'gene_name': str
    'n_variants': int32
    'gene_id': str
    'interval': interval<locus<GRCh38>>
----------------------------------------
Key: ['interval']

I want to annotate my variants (in my mt ) and create a structure rows filed which will have all the information. One variant could overlap with multiple intervals, and hence I probably need to create array<struct{}
I am not sure how to do that. Could anybody help me please ?

tpoterba · June 29, 2020, 11:45am

The familiar syntax below is almost what you want:

mt = mt.annotate_rows(interval_data = ht[mt.row_key])

However, this will give you one distinct struct per row of the matrix table. Instead, you can call through to the Table.index method, and use the all_matches flag:

mt = mt.annotate_rows(interval_data = ht.index(mt.row_key, all_matches=True))

kousikbioinfo · June 29, 2020, 1:10pm

Thank you very much @tpoterba. It was exactly what I want.

I am having another issue now -

my ht looks like -

I have a variant chr1:948594 in my mt, which was surprisingly not annotated with the ht information, althouth the position is present in the ht, - see the last line - the end position is the actual variant position.

Here is my code -

mt = mt.annotate_rows(windows = ht.index(mt.row_key[0], all_matches=True))   
mt = mt.rows()
mt.select(mt.windows).show()

Which gives me the output like -

As you can see the last variant was not correctly annotated.

tpoterba · June 29, 2020, 1:27pm

The printed representation of these intervals indicates using interval notation that they are inclusive of the left endpoint, but exclusive of the right endpoint. You’ll need to re-create these intervals to include the right endpoint, if you want chr1:946174-chr1:948594 to include chr1:948594.

kousikbioinfo · June 29, 2020, 1:57pm

Ahhh Okie. Thanks Tim.

kousikbioinfo · July 2, 2020, 11:30pm

Hi,
I have a question regarding Table.index

I have a Hail table (ht)
ht.describe()

----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'CHR': int32 
    'START': int32 
    'END': int32 
    'Window_boundary': str 
    'Window_Name': str 
    'Gene_ID': str 
    'No_of_Variants': int32 
    'Range': str 
    'interval': interval<locus<GRCh38>> 
----------------------------------------
Key: ['interval', 'Gene_ID']
----------------------------------------

Now I need to annotate my mt
The key of my mt is -

----------------------------------------
Column key: ['s']
Row key: ['locus', 'gene_set']
----------------------------------------

As @tpoterba mentioned earlier to this thread, I am trying this -

mt = mt.annotate_rows(interval_data = ht.index(mt.row_key, all_matches=True))

But it gives me an Error mentioned below -

Error:

ExpressionException: Key type mismatch: cannot index table with given expressions:
  Table key:         interval<locus<GRCh38>>, str
  Index Expressions: locus<GRCh38>, str

However,

mt = mt.annotate_rows(interval_data = ht.index(mt.row_key[0], all_matches=True))

works. But I need to match both condition (i) the variants (mt.locus) should be present in the interval (ht.interval) and (ii) for matching genes, the mt.gene_set should match with ht.Gene_ID.
Note that my mt already have some duplicated mt.locus with different mt.gene_set but the key is unique.

Could you please help me. Thanks

tpoterba · July 3, 2020, 12:01am

I think the solution here is to take gene_set out of the table key.

kousikbioinfo · July 3, 2020, 12:06am

But then how do I match both the conditions - locus and gene_set ?

kousikbioinfo · July 3, 2020, 12:07am

And also mt.locus is repeated. Same locus for multiple genes !

tpoterba · July 3, 2020, 11:49am

I see. Interval joins are a bit special, and only are possible when joining interval to , and don’t work when you have additional key fields after the interval.

I think what I’d do is (1) key the interval table by only the interval, and (b) filter the result after:

mt = mt.annotate_rows(interval_matches = 
  interval_ht.index(mt.locus, all_matches=True).filter(
    lambda x: x.gene_set == mt.gene_set))

kousikbioinfo · July 3, 2020, 12:43pm

I am not sure, this will give me what I want -
Let’s take an example -
chr1:1000 is a LoF variant for GeneA and GeneB, and chr1:1100 is a LoF variant for GeneC

The interval file says -

chr1:100-2000 GeneA rank1
chr1:100-2000 GeneA rank2
chr1:100-2000 GeneB rank1
chr1:100-2000 GeneC rank1

I want to annotate my MatrixTable in a way -

chr1:1000 [GeneA, GeneB]  ## this we get from all_matches=True
chr1:1100 [GeneC]

But what you proposed, wouldn’t I get

chr1:1000 [GeneA, GeneB, GeneC]  
chr1:1100 [GeneA, GeneB, GeneC]

Eventually, I want to explode the Gene, and then chr1:1000 will be annotated with GeneC, and chr1:1100 will be annotated with GeneA and GeneB which are wrong.
Sorry I am being very complicated

kousikbioinfo · July 3, 2020, 1:08pm

Ahh wait -
So what you proposed @tpoterba - it actually filter Genes right
So with the above mentioned example -
it gives -

chr1:1000 [GeneA, GeneB]  ## this we get from all_matches=True
chr1:1100 [GeneC]

Right ??

tpoterba · July 3, 2020, 3:26pm

yep, exactly.

kousikbioinfo · July 3, 2020, 8:59pm

Thank you very much @tpoterba for the confirmation

Topic		Replies	Views
Creating a Variant representation on a V0.2 Table Hail Query & hailctl	4	499	February 12, 2019
Annotate variants with hom var samples Hail Query & hailctl	0	342	January 19, 2023
Help reformatting variant ID string to annotate MatrixTable keyed by locus, allele Hail Query & hailctl	1	316	October 12, 2023
Issues with annotating a MT with a HT Hail Query & hailctl	1	318	June 3, 2022
Error when trying to annotate a new row with a genotypes of the sample Hail Query & hailctl	2	330	July 13, 2023

Variant annotation in MatrixTable

Related topics