Annotate matrixtable with count from another table

#1

Hi,

I would like to annotate a matrixtable with a region count from a column in another table.

From the documentation I came as far as the to detect the presence like this:
vcf_mt.annotate_rows(region = hl.is_defined(bed_table[vcf_mt.locus]))

But instead of a boolean, I would like to get a count, so something like:
vcf_mt.annotate_rows(count = bed_table.aggregate(hl.agg.count_where(bed_table.interval.contains(vcf_mt.locus))))

That doesn’t work though, with the following error:

ExpressionException: Cannot combine expressions from different source objects.
Found fields from 2 objects:
<hail.table.Table object at 0x7fa8289c9c88>: [‘interval’]
<hail.matrixtable.MatrixTable object at 0x7fa82859d438>: [‘locus’]

Any suggestions on how to do this (efficiently) are highly welcome!

Thanks!

Mark

#2

If the bed_table weren’t keyed by an interval, then could compute the count per key in the bed table and do a standard join. However, it is keyed by intervals, and don’t currently support one-to-many joins in this case.

Luckily, we have an open pull request to add this feature!

The code will look like:

vcf_mt.annotate_rows(count = hl.len(bed_table.index(mt.locus, all_matches=True)))
#3

Ha! Just checked out his branch and that does the job indeed!