Question on filter intervals for sample based on sex

Hi,

I’m new to hail. I might have some very basic questions.

Currently, I am doing the genotype QC, to exclude chromosome Y for female participants. However, I was able to do is to either filter the matrix table to female participants only,

mt_female = mt.filter_cols(mt.pheno.is_female)

Or to use hl.filter_intervals to remove chrY for both male and female participants,

intervals = [hl.parse_locus_interval(x, reference_genome = ‘GRCh38’) for x in ‘chrY’]
mt_filtered = hl.filter_intervals(mt, intervals, keep = False)

Is it possible to remove chromosome Y from female participants?

Any advice?

Thank you.

Hail represents genetic data in the MatrixTable, which is a structured matrix of fields. The cheat sheet will be very helpful for seeing some visual representations of transformations.

“removing chromosome Y from female participants” is not a super clear operation on a matrix, because it means removing a block of rows (variants) for some of the columns (samples), leaving something that isn’t actually a matrix.

Instead, you might want to filter the entries of female participants on chromosome Y. This means removing entries from the matrix, leaving a matrix that looks like swiss cheese with holes in it where filtered entries used to be.

This would look like:


mt = mt.annotate_rows(is_y = mt.locus.in_y_nonpar())
mt = mt.filter_entries(mt.is_y & mt.pheno.is_female, keep=False)

That works. Thank you!