How to write a MatrixTable to a file as a tab separated table in wide format?

leocob · March 17, 2020, 11:06am

Hello Hail team!

I’ve recently started using Hail 0.2 but I got stuck with the following problem.

Is there a way to write a Hail table to a tab separated file in wide format?

More in details: I have a MatrixTable with this structure:

mt.describe()
----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
----------------------------------------
Row fields:
    'gene_id': str
----------------------------------------
Entry fields:
    'number': int64
----------------------------------------
Column key: ['s']
Row key: ['gene_id']
----------------------------------------

mt.show()

gene_id	Sample1.number	Sample2.number	Sample3.number	Sample4.number
str	int64	int64	int64	int64
“ENSG00000000457”	9	5	5	5
“ENSG00000000460”	23	13	11	11
“ENSG00000000971”	0	4	4	4
“ENSG00000001036”	3	3	3	3

I would like to write the MatrixTable to a file table, keeping the same wide format that we can see from the mt.show().

The problem is that I have around 15k genes and around 500k samples.

I tried with the command make_table() but after 24 hours running (on a cluster node with 28 processors) it was still only at half the process.

I tried using ht = mt.entries() and then ht.export(“myfile.tsv.bgz”) but, after unzipping the bgz file, I got a file of 170 gigabytes, containing the table I wanted but in long format, of which an example is here:

gene_id 		s       number
ENSG00000000419 Sample1 5
ENSG00000000419 Sample2 8
ENSG00000000419 Sample3 0
ENSG00000000419 Sample4 3
ENSG00000000419 Sample5 23
ENSG00000000419 Sample6 14

If the file was smaller I could easily pivot the table using pandas or tidyverse. But I’m afraid that 170 gigabytes is a bit too much. I could try using Spark or other strategies for big data but I keep telling myself there should be a way to do this natively with Hail!

Can anyone help me? Thank you!

tpoterba · March 17, 2020, 11:58am

I think this should work:

mt.number.export('myfile.tsv.bgz')

This is a bit hard to find in the docs, sorry.

leocob · March 17, 2020, 9:32pm

Thank you so much! It works!
If I had known, I would have asked my question before. I spent more than one week trying to solve this problem!
Next time I’ll know I can count on the fast support of the Hail team

tpoterba · March 17, 2020, 9:34pm

The more questions we get, the more information we have about how we can improve our docs! I promise we’ll actually start using that information to improve them soon…

Topic		Replies	Views
Write compressed Tables/Matrices Hail Query & hailctl	1	135	March 25, 2024
Write matrix table to a csv file Hail Query & hailctl	7	770	October 21, 2021
`Table` to `MatrixTable` to export `VCF` Hail Query & hailctl	2	436	May 20, 2023
Exporting data from MatrixTable into TSV Hail Query & hailctl	7	1351	December 14, 2021
Is there a way to load a long format file into hail matrix table? Hail Query & hailctl	3	490	December 9, 2021

How to write a MatrixTable to a file as a tab separated table in wide format?

Related topics