I would like a hail script that reads in the individual chromosomal matrix tables into one larger matrix table for ld pruning and running pc_relate. I tried using the wild card chr* but get an error. Is there a way to read in multiple matrix tables.
mt_fn=’/project/adgc/imp.topmed_adsp5k/mt/adgc.aa.imp30r2.topmed_adsp5k.chr*.mt’
mt=hl.read_matrix_table(mt_fn)
Hail version: 0.2.19-c6ec8b76eb26
Error summary: HailException: MatrixTable and Table files are directories; path ‘/project/adgc/imp.topmed_adsp5k/mt/adgc.aa.imp30r2.topmed_adsp5k.chr*.mt’ is not a directory
this is intentional; a MatrixTable is already a composite object, so it shouldn’t be a common use case to glob them.
You can just iterate in Python. First, let’s define a helper function that makes a nested union N log N, not quadratic (see here for more info):
def union_cols_all(mts):
mts = mts[:]
iteration = 0
while (len(mts) > 1):
iteration += 1
print(f'iteration {iteration}')
tmp = []
for i in range(0, len(mts), 2):
tmp.append(mts[i].union_cols(mts[i+1]))
mts = tmp[:]
return mts[0]
And then read and union in Python:
files = [f'/path/to/chr{chrom}' for chrom in list(range(23)) + ['X', 'Y']]
mt = union_cols_all([hl.read_matrix_table(file) for file in files])