Annotating a variants in a hail matrix with de novo

When I run the de_novo calling using hail de_novo(), I obtain a hail table. I was wondering if there was a straight forward way to annotate the initial hail matrix entries with whether or not the variant was de novo or not from the hail table?

Thank you!

I think you can do something like:

de_novo_table = hl.de_novo(mt, ...)
mt = mt.annotate_entries(was_called_de_novo = hl.is_defined(de_novo_table[mt.row_key, mt.col_key]))

When I try that approach, I get the following error message:

ExpressionException: Key type mismatch: cannot index table with given expressions:
Table key: locus, array, str
Index Expressions: struct{locus: locus, alleles: array}, struct{s: str}

oh, oops! should be [mt.locus, mt.alleles, mt.s] in the brackets, I think. Might need this above as well:

de_novo_table = de_novo_table.key_by('locus', 'alleles', 'id')

I still seem to have the same error:

ExpressionException: Key type mismatch: cannot index table with given expressions:
Table key: locus, array, str
Index Expressions: struct{locus: locus, alleles: array}, struct{s: str}, str

are you indexing with these instead of [mt.row_key, mt.col_key] now? I’d expect that error if you aren’t.

oh haha yeah, I was doing the opposite. But now I have the following error:


NotImplementedError Traceback (most recent call last)
in
----> 1 asc_mt = asc_mt.annotate_entries(is_denovo = hl.is_defined(de_novo_HC[asc_mt.locus, asc_mt.alleles, asc_mt.s]))

/usr/local/lib/python3.7/dist-packages/hail/table.py in getitem(self, item)
372
373 try:
–> 374 return self.index(*wrap_to_tuple(item))
375 except TypeError as e:
376 raise TypeError(“Table.getitem: invalid index argument(s)\n”

/usr/local/lib/python3.7/dist-packages/hail/table.py in index(self, all_matches, *exprs)
1547 “”"
1548 try:
-> 1549 return self._index(*exprs, all_matches=all_matches)
1550 except TableIndexKeyError as err:
1551 raise ExpressionException(f"Key type mismatch: cannot index table with given expressions:\n"

/usr/local/lib/python3.7/dist-packages/hail/table.py in _index(self, all_matches, *exprs)
1677 # match on indices to determine join type
1678 if indices == src._entry_indices:
-> 1679 raise NotImplementedError(‘entry-based matrix joins’)
1680 elif indices == src._row_indices:
1681 is_subset_row_key = len(exprs) <= len(src.row_key) and all(

NotImplementedError: entry-based matrix joins

oh, that’s super annoying – could have sworn we had this. Anyway, here’s a workaround;

de_novo_table = hl.de_novo(mt, ...)
de_novo_mt = de_novo_table.to_matrix_table(row_key=['locus', 'alleles'], col_key=['id'])
mt = mt.annotate_entries(was_called_de_novo = hl.is_defined(de_novo_mt[mt.row_key, mt.col_key]))

Thank you that seems to work! with the following warning message:

2020-11-06 21:58:06 Hail: WARN: cols(): Resulting column table is sorted by ‘col_key’.
To preserve matrix table column order, first unkey columns with ‘key_cols_by()’

This is safe to ignore. It’s a message printed the first time you call mt.cols() because sometimes people are surprised that the sample order in the matrix table is not preserved in the columns table (currently, Hail guarantees that keyed tables are sorted, while columns of a matrix table are NOT sorted by key).

The internal implementation of to_matrix_table calls .cols() internally.

Great, thank you!