Hi Hail team,
I’m trying to merge all my chromosome MT into one MT to run sample QC.
I have exactly the same samples, but I cannot successfully ran union_row(), as below.
The code I ran:
mt_2 = "chr2.mt"
mt_20 = "chr20.mt"
mt2 = hl.read_matrix_table(mt_2)
print("mt2.count() = {}".format(mt2.count()))
mt20 = hl.read_matrix_table(mt_20)
print("mt20.count() = {}".format(mt20.count()))
# Union Rows
all_mt = mt2.union_rows(mt20)
all_mt.count()
The error message:
mt2.count() = (4796512, 366)
mt20.count() = (1512641, 366)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/738332.tmpdir/ipykernel_33/2717554317.py in <module>
10 print("mt20.count() = {}".format(mt20.count()))
11
---> 12 all_mt = mt2.union_rows(mt20)
13 all_mt.count()
<decorator-gen-1304> in union_rows(_check_cols, *datasets)
/opt/conda/lib/python3.7/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
575 def wrapper(__original_func, *args, **kwargs):
576 args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 577 return __original_func(*args_, **kwargs_)
578
579 return wrapper
/opt/conda/lib/python3.7/site-packages/hail/matrixtable.py in union_rows(_check_cols, *datasets)
3621 .find(lambda x: ~(x[1] == first_keys))[0])))
3622 if wrong_keys is not None:
-> 3623 raise ValueError(f"'MatrixTable.union_rows' expects all datasets to have the same columns. "
3624 f"Datasets 0 and {wrong_keys + 1} have different columns (or possibly different order).")
3625 return MatrixTable(ir.MatrixUnionRows(*[d._mir for d in datasets]))
ValueError: 'MatrixTable.union_rows' expects all datasets to have the same columns. Datasets 0 and 1 have different columns (or possibly different order).
And I take a look the source code (Hail | hail.matrixtable), it seems like my column has different order.
Does anyone know how to fix this problem? I’m thinking re-order my column, but I’m not sure how to do it.
// chr2:
mt2_col_list = mt2.col_key.collect()
mt2_col_list[:10]
[Struct(s='TWHJ-PNRR-10145'),
Struct(s='TWHJ-PNRR-10826-10826'),
Struct(s='TWHJ-PNRR-10245'),
Struct(s='TWHJ-PNRR-10703'),
Struct(s='TWHJ-PNRR-10867-10867'),
Struct(s='TWHJ-PNRR-10787'),
Struct(s='TWHJ-PNRR-10833-10833'),
Struct(s='TWHJ-PNRR-10859-10859'),
Struct(s='TWHJ-PNRR-10716-10716'),
Struct(s='TWHJ-PNRR-10823-10823')]
// chr20:
mt20_col_list = mt20.col_key.collect()
mt20_col_list[:10]
[Struct(s='TWHJ-PNRR-10800-10800'),
Struct(s='TWHJ-PNRR-10577'),
Struct(s='TWHJ-PNRR-10332-10332'),
Struct(s='TWHJ-PNRR-10257'),
Struct(s='TWHJ-PNRR-10388'),
Struct(s='TWHJ-PNRR-10951'),
Struct(s='TWHJ-PNRR-10954'),
Struct(s='TWHJ-PNRR-10105-10105'),
Struct(s='TWHJ-PNRR-10188'),
Struct(s='TWHJ-PNRR-10453')]
Thanks for helping and happy new year!
Best,
Po-Ying