Long time to export UK Biobank GWAS result to tsv file

chenll0105 · April 25, 2020, 7:24am

When I try to use the following code to export chr1 gwas result to tsv file, it takes more than hours.

gwas.export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')

The gwas data contains about 7.6 million variants.

I wonder how can I speed up the export process, thanks a lot!

danking · April 25, 2020, 7:32am

Hi! Sorry you’re having a bad experience.

Keep in mind that Hail is lazy. Nothing is done until you write gwas.export. At that point, Hail performs all the operations that you’ve requested, including the linear regressions!
Table.export with the default parallel=None flag performs a very slow file concatenation step! Do you really need a text file? You can use Hail to analyze those 7.6 million GWAS results. Either way, you should write first to Hail’s efficient and fast on-disk format, then read back in and convert to a text file. This will perform all the GWAS code once, store the result to disk, then read it back only so that it can convert it to text and export it.

gwas.write('chr1_gwas.ht')
hl.read_table('chr1_gwas.ht').export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')

The UKB is a large dataset! It might take some time to compute linear regression on the biggest chromosome.

chenll0105 · April 27, 2020, 1:35am

Thanks a lot，it does help me out！

was.read('chr1_gwas.ht').export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')

The code below shoud be changed to:

hl.read_table('chr1_gwas.ht').export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')

Topic		Replies	Views
Export GWAS summary statistics to a .txt file Hail Query & hailctl	8	1116	February 22, 2022
Improve writing time for GWAS results Hail Query & hailctl	2	461	November 20, 2020
Performance genotypes Hail Query & hailctl	3	386	October 6, 2020
Requesting advice on efficiently parsing through many GWAS results Hail Query & hailctl	8	587	June 14, 2022
How to run GWAS from UK Biobank efficiently on Hail Hail Query & hailctl	11	3285	December 21, 2020