Filter variants based on other files

I have a Hail matrix table with variants and samples (h1) and a txt file from clinvar vcf. I would like to filter out the variants (row) that are not in the clinvar txt file but I am not sure how.

Hail matrix table is keyed on chr:pos and the array of alleles. I successfully imported txt file as hail table (no key) and tried to annotate h1 with this table then filter with the new field.

However, I don’t know how to key the txt file to be the same as h1 in order to annotate it then filter. Or, am I thinking it completely wrong?

Sorry for the basic question, I started Hail only recently, would greatly appreciate if someone has some idea for the situation.

Thank you!

How are the variants formatted in the clinvar table?

It will look something like:

# annotate the clinvar table to create a locus and alleles
clinvar = clinvar.key_by('locus', 'alleles')

# semi_join_rows keeps the rows whose keys overlap with the table's keys
mt = mt.semi_join_rows(clinvar) 

these four are the columns that should be keyed… and h1 key is like
chr1:17018956 [“A”,“T”]

The missing bit can be:


clinvar = clinvar.annotate(
    locus = hl.locus(clinvar.Chromosome, clinvar.PositionVCF, reference_genome='GRCh37'),
    alleles = [clinvar.Reference, clinvar.AlternateAlleleVCF]

I should say, if you had the VCF just importing that would make this easier!