Merge MTs different order of alleles and locus

Hi,

I’m trying to merge 1KG and my mt to perform PCA. After merging I have this error:

ValueError: ‘MatrixTable.union_rows’ expects row types for all datasets to be the same. Found:
dataset 0: struct{alleles: array, locus: locus}
dataset 1: struct{locus: locus, alleles: array}

so it looks like the order in the 1kg is allele locus and in mine it is locus allele. Not sure why this is the case but cat anyone suggest a fix on how to swap one over to match? The syntax escapes me.

Thanks a lot

Dan

for dataset0, do mt = mt.key_rows_by('locus', 'alleles')

Thanks a lot for the advice Tim,

It doesn’t seem to work

I have tried:

thousand_mt_prune=thousand_mt_prune.key_rows_by(‘locus’,‘alleles’)
mt_prune=mt_prune.key_rows_by(‘locus’,‘alleles’)

thousand_mt_prune.rows().show()
mt_prune.rows().show()

dataset_result = thousand_mt_prune.union_rows(mt_prune)

and I get:

alleles | locus
array<str> | locus<GRCh38>

locus| alleles
locus<GRCh38> | array<str>

ValueError: ‘MatrixTable.union_rows’ expects row types for all datasets to be the same. Found:
dataset 0: struct{alleles: array, locus: locus}
dataset 1: struct{locus: locus, alleles: array}

Oh! The key type is the same, the order is different. This is a little annoying to fix. We have a to-do item to add type unification like Table.union has, but for now, you should be able to do the following:

mt = mt.rename({'locus': 'locus2', 'allelles': 'alleles2'})
mt = mt.select_rows(locus=mt.locus2, alleles=mt.alleles2)
mt = mt.key_rows_by('locus', 'alleles')

Hacky.

Hi Tim,

That’s fantastic, hacky but it works fine :slight_smile:

Thanks a lot

Dan