Merge MTs different order of alleles and locus

danchubb · July 15, 2020, 12:00pm

Hi,

I’m trying to merge 1KG and my mt to perform PCA. After merging I have this error:

ValueError: ‘MatrixTable.union_rows’ expects row types for all datasets to be the same. Found:
dataset 0: struct{alleles: array, locus: locus}
dataset 1: struct{locus: locus, alleles: array}

so it looks like the order in the 1kg is allele locus and in mine it is locus allele. Not sure why this is the case but cat anyone suggest a fix on how to swap one over to match? The syntax escapes me.

Thanks a lot

Dan

tpoterba · July 15, 2020, 12:04pm

for dataset0, do mt = mt.key_rows_by('locus', 'alleles')

danchubb · July 15, 2020, 1:07pm

Thanks a lot for the advice Tim,

It doesn’t seem to work

I have tried:

thousand_mt_prune=thousand_mt_prune.key_rows_by(‘locus’,‘alleles’)
mt_prune=mt_prune.key_rows_by(‘locus’,‘alleles’)

thousand_mt_prune.rows().show()
mt_prune.rows().show()

dataset_result = thousand_mt_prune.union_rows(mt_prune)

and I get:

alleles | locus
array<str> | locus<GRCh38>

locus| alleles
locus<GRCh38> | array<str>

ValueError: ‘MatrixTable.union_rows’ expects row types for all datasets to be the same. Found:
dataset 0: struct{alleles: array, locus: locus}
dataset 1: struct{locus: locus, alleles: array}

tpoterba · July 15, 2020, 3:41pm

Oh! The key type is the same, the order is different. This is a little annoying to fix. We have a to-do item to add type unification like Table.union has, but for now, you should be able to do the following:

mt = mt.rename({'locus': 'locus2', 'allelles': 'alleles2'})
mt = mt.select_rows(locus=mt.locus2, alleles=mt.alleles2)
mt = mt.key_rows_by('locus', 'alleles')

Hacky.

danchubb · July 16, 2020, 4:15pm

Hi Tim,

That’s fantastic, hacky but it works fine

Thanks a lot

Dan

Topic		Replies	Views
How to fix the error of 'MatrixTable.union_rows' expects all datasets to have the same columns Hail Query & hailctl	3	660	January 3, 2022
Mergin MatrixTable raised strange row type error Hail Query & hailctl	9	555	June 28, 2021
After the union_cols() number of rows decreases Hail Query & hailctl	1	467	January 19, 2022
Concat rows two matrix tables Hail Query & hailctl	5	444	December 8, 2023
Difficulty joining ht to mt Hail Query & hailctl	3	600	June 15, 2022

Merge MTs different order of alleles and locus

Related topics