How many reference genotypes is common for two samples?

Rost · December 21, 2019, 11:58am

I got 2-sample matrix.

samples = hl.literal(set([‘sample1’, ‘sample2’]))
mt_test = mt.filter_cols(samples.contains(mt.s), keep = True)

Now, I want to take all reference genotypes to calculate how similar are they. Finally, I hope to get one number (for example ratio of common reference genotypes to all). This is will do to check my PI_HAT metrics. But I got some problems in next …

mt_test.aggregate_rows(hl.agg.count_where(mt_test.GT == hl.literal(set([‘0/0’, ‘0/0’]))))
TypeError: Invalid ‘==’ comparison, cannot compare expressions of type ‘call’ and ‘set’

What am I wrong about?

danking · December 23, 2019, 1:48pm

aggregate_rows aggregates over the row fields to produce a python value. GT is an entry field, so you cannot use it here.

You’re thinking about the entire array of entries as if they were a row field, which is actually reasonable! localize_entries is a method that converts a matrix table to a table with an array of entries.

Anyway, because the matrix table has exactly two samples, you could just use hl.agg.all:

mt = mt.annotate_rows(
    everyone_is_hom_ref = hl.agg.all(mt.GT.is_hom_ref()))
n_sites_where_everyone_is_hom_ref = mt.aggregate_rows(
    hl.agg.count_where(mt.everyone_is_hom_ref))

You have to do it in two steps because this really is a two step process: first collapse the entry fields into row field, then collapse the row fields into a single value.

Rost · December 23, 2019, 4:11pm

Thank you! Now it works! I hope that a training course will appear soon to save you from such questions.

Topic		Replies	Views
Multiallelic in multisample MatrixTable Hail Query & hailctl	5	340	November 29, 2022
Gene-based sample statistics Hail Query & hailctl	6	735	January 27, 2020
Querying variants by genotype counts for two cohorts Hail Query & hailctl	1	333	November 8, 2021
Error when trying to annotate a new row with a genotypes of the sample Hail Query & hailctl	2	329	July 13, 2023
Group by columns and aggregate entries over all entries in the group Hail Query & hailctl	2	447	August 30, 2021

How many reference genotypes is common for two samples?

Related topics