nonchev
1
Hello,
I am trying to change the reference genome of MatrixTable. Sth like:
mt = hl.methods.read_matrix_table(path) if here it is saved as reference_genome =None
mt.set_reference_genome(‘GRCh37’)
or here of the Table:
ht = hl.Table.from_pandas(var)
ht = ht.key_by(**hl.parse_variant(ht.chrom + hl.literal(‘:’) + hl.str(ht.pos) + hl.literal(‘:’) + ht.ref + hl.literal(‘:’) + ht.alt))
ht.set_reference_genome(None)
to pass reference_genome=None here
Is this possible?
I want to execute after that:
ht = hl.MatrixTable.from_rows_table(ht)
result = mt.filter_rows(~hl.is_missing(ht.index_rows(mt[‘locus’], mt[‘alleles’])))
and I am getting an error that both locus should be the same type
Thanks in advance!
Best,
nonchev
2
if reference genome is defferent between two mt
result = mt.filter_rows(~hl.is_missing(ht.index_rows(hl.locus(mt.locus.contig, mt.locus.position), mt[‘alleles’])))
this one will work, but it seems to be slow
I think that the syntax here could be causing problems in the Hail optimizer. There’s a simpler way to do this, which I believe will perform better:
result = mt.filter_rows(~hl.is_missing(ht.index(mt['locus'], mt[‘alleles’])))
or equivalent, but shorter:
result = mt.filter_rows(~hl.is_missing(ht.index(mt.row_key]))
or equivalent, but even shorter:
result = mt.semi_join_rows(ht)
1 Like