Annotate MatrixTable with gnomAD table

Hi!
I want to use gnomAD AF annotation directly obtained from https://gnomad.broadinstitute.org/downloads.
But I can’t merge my MatrixTable with gnomad Table.

tbl.key

<StructExpression of type struct{locus: locus<GRCh37>, alleles: array<str>}>
mt.row_key

<StructExpression of type struct{locus: locus<GRCh37>, alleles: array<str>}>

the row-keys seems the same, but following command fails:

mt.annotate_rows(tbl[mt.locus])
TableIndexKeyError: (dtype('struct{locus: locus<GRCh37>, alleles: array<str>}'), (<LocusExpression of type locus<GRCh37>>,))

During handling of the above exception, another exception occurred:

ExpressionException: Key type mismatch: cannot index table with given expressions:
  Table key:         locus<GRCh37>, array<str>
  Index Expressions: locus<GRCh37>

When I try to add a second index it also leads to an issue:

mt.annotate_rows(tbl[mt.locus, mt.alleles])
TypeError: annotate_rows() takes 1 positional argument but 2 were given

This is invalid syntax – an annotate_rows call needs names for the fields to produce:

mt.= mt.annotate_rows(gnomad_fields = tbl[mt.locus])

I assume this will still fail, but the error may be more informative. Can you paste the full python stack trace when it does?

1 Like

i think mt = mt.annotate_rows(gnomad_fields = tbl[mt.locus, mt.alleles]) (or mt = mt.annotate_rows(gnomad_fields = tbl[mt.row_key])) should work here

1 Like

Thanks! Now it works.

---------------------------------------------------------------------------
TableIndexKeyError                        Traceback (most recent call last)
~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/table.py in index(self, all_matches, *exprs)
   1503         try:
-> 1504             return self._index(*exprs, all_matches=all_matches)
   1505         except TableIndexKeyError as err:

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/table.py in _index(self, all_matches, *exprs)
   1580             if not is_interval:
-> 1581                 raise TableIndexKeyError(self.key.dtype, exprs)
   1582 

TableIndexKeyError: (dtype('struct{locus: locus<GRCh37>, alleles: array<str>}'), (<LocusExpression of type locus<GRCh37>>,))

During handling of the above exception, another exception occurred:

ExpressionException                       Traceback (most recent call last)
<ipython-input-99-aaa890c324fe> in <module>
----> 1 mt.annotate_rows(gnomad_fields = tbl[mt.locus])

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/table.py in __getitem__(self, item)
    364         else:
    365             try:
--> 366                 return self.index(*wrap_to_tuple(item))
    367             except TypeError as e:
    368                 raise TypeError(f"Table.__getitem__: invalid index argument(s)\n"

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/table.py in index(self, all_matches, *exprs)
   1506             key_type, exprs = err.args
   1507 
-> 1508             raise ExpressionException(f"Key type mismatch: cannot index table with given expressions:\n"
   1509                                       f"  Table key:         {', '.join(str(t) for t in key_type.values()) or '<<<empty key>>>'}\n"
   1510                                       f"  Index Expressions: {', '.join(str(e.dtype) for e in exprs)}")

ExpressionException: Key type mismatch: cannot index table with given expressions:
  Table key:         locus<GRCh37>, array<str>
  Index Expressions: locus<GRCh37>

As konradjk suggested, both indices should be used here.