Hi,
I’m trying to merge 1KG and my mt to perform PCA. After merging I have this error:
ValueError: ‘MatrixTable.union_rows’ expects row types for all datasets to be the same. Found:
dataset 0: struct{alleles: array, locus: locus}
dataset 1: struct{locus: locus, alleles: array}
so it looks like the order in the 1kg is allele locus and in mine it is locus allele. Not sure why this is the case but cat anyone suggest a fix on how to swap one over to match? The syntax escapes me.
Thanks a lot
Dan
for dataset0, do mt = mt.key_rows_by('locus', 'alleles')
Thanks a lot for the advice Tim,
It doesn’t seem to work
I have tried:
thousand_mt_prune=thousand_mt_prune.key_rows_by(‘locus’,‘alleles’)
mt_prune=mt_prune.key_rows_by(‘locus’,‘alleles’)
thousand_mt_prune.rows().show()
mt_prune.rows().show()
dataset_result = thousand_mt_prune.union_rows(mt_prune)
and I get:
alleles | locus
array<str> | locus<GRCh38>
locus| alleles
locus<GRCh38> | array<str>
ValueError: ‘MatrixTable.union_rows’ expects row types for all datasets to be the same. Found:
dataset 0: struct{alleles: array, locus: locus}
dataset 1: struct{locus: locus, alleles: array}
Oh! The key type is the same, the order is different. This is a little annoying to fix. We have a to-do item to add type unification like Table.union has, but for now, you should be able to do the following:
mt = mt.rename({'locus': 'locus2', 'allelles': 'alleles2'})
mt = mt.select_rows(locus=mt.locus2, alleles=mt.alleles2)
mt = mt.key_rows_by('locus', 'alleles')
Hacky.
Hi Tim,
That’s fantastic, hacky but it works fine
Thanks a lot
Dan