How to write a MatrixTable to a file as a tab separated table in wide format?

Hello Hail team!

I’ve recently started using Hail 0.2 but I got stuck with the following problem.

Is there a way to write a Hail table to a tab separated file in wide format?

More in details: I have a MatrixTable with this structure:

mt.describe()
----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
----------------------------------------
Row fields:
    'gene_id': str
----------------------------------------
Entry fields:
    'number': int64
----------------------------------------
Column key: ['s']
Row key: ['gene_id']
----------------------------------------

mt.show()

gene_id Sample1.number Sample2.number Sample3.number Sample4.number
str int64 int64 int64 int64
“ENSG00000000457” 9 5 5 5
“ENSG00000000460” 23 13 11 11
“ENSG00000000971” 0 4 4 4
“ENSG00000001036” 3 3 3 3

I would like to write the MatrixTable to a file table, keeping the same wide format that we can see from the mt.show().

The problem is that I have around 15k genes and around 500k samples.

I tried with the command make_table() but after 24 hours running (on a cluster node with 28 processors) it was still only at half the process.

I tried using ht = mt.entries() and then ht.export(“myfile.tsv.bgz”) but, after unzipping the bgz file, I got a file of 170 gigabytes, containing the table I wanted but in long format, of which an example is here:

gene_id 		s       number
ENSG00000000419 Sample1 5
ENSG00000000419 Sample2 8
ENSG00000000419 Sample3 0
ENSG00000000419 Sample4 3
ENSG00000000419 Sample5 23
ENSG00000000419 Sample6 14

If the file was smaller I could easily pivot the table using pandas or tidyverse. But I’m afraid that 170 gigabytes is a bit too much. I could try using Spark or other strategies for big data but I keep telling myself there should be a way to do this natively with Hail!

Can anyone help me? Thank you! :slight_smile:

I think this should work:

mt.number.export('myfile.tsv.bgz')

This is a bit hard to find in the docs, sorry.

1 Like

Thank you so much! It works!
If I had known, I would have asked my question before. I spent more than one week trying to solve this problem! :stuck_out_tongue:
Next time I’ll know I can count on the fast support of the Hail team :slight_smile:

The more questions we get, the more information we have about how we can improve our docs! I promise we’ll actually start using that information to improve them soon…