How many reference genotypes is common for two samples?

I got 2-sample matrix.

samples = hl.literal(set([‘sample1’, ‘sample2’]))
mt_test = mt.filter_cols(samples.contains(mt.s), keep = True)

Now, I want to take all reference genotypes to calculate how similar are they. Finally, I hope to get one number (for example ratio of common reference genotypes to all). This is will do to check my PI_HAT metrics. But I got some problems in next …

mt_test.aggregate_rows(hl.agg.count_where(mt_test.GT == hl.literal(set([‘0/0’, ‘0/0’]))))
TypeError: Invalid ‘==’ comparison, cannot compare expressions of type ‘call’ and ‘set’

What am I wrong about?

aggregate_rows aggregates over the row fields to produce a python value. GT is an entry field, so you cannot use it here.

You’re thinking about the entire array of entries as if they were a row field, which is actually reasonable! localize_entries is a method that converts a matrix table to a table with an array of entries.

Anyway, because the matrix table has exactly two samples, you could just use hl.agg.all:

mt = mt.annotate_rows(
    everyone_is_hom_ref = hl.agg.all(mt.GT.is_hom_ref()))
n_sites_where_everyone_is_hom_ref = mt.aggregate_rows(
    hl.agg.count_where(mt.everyone_is_hom_ref))

You have to do it in two steps because this really is a two step process: first collapse the entry fields into row field, then collapse the row fields into a single value.

Thank you! Now it works! I hope that a training course will appear soon to save you from such questions. :slightly_smiling_face: