Long time to export UK Biobank GWAS result to tsv file

When I try to use the following code to export chr1 gwas result to tsv file, it takes more than hours.

gwas.export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')

The gwas data contains about 7.6 million variants.

I wonder how can I speed up the export process, thanks a lot!

Hi! Sorry you’re having a bad experience.

  1. Keep in mind that Hail is lazy. Nothing is done until you write gwas.export. At that point, Hail performs all the operations that you’ve requested, including the linear regressions!
  2. Table.export with the default parallel=None flag performs a very slow file concatenation step! Do you really need a text file? You can use Hail to analyze those 7.6 million GWAS results. Either way, you should write first to Hail’s efficient and fast on-disk format, then read back in and convert to a text file. This will perform all the GWAS code once, store the result to disk, then read it back only so that it can convert it to text and export it.
gwas.write('chr1_gwas.ht')
hl.read_table('chr1_gwas.ht').export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')
  1. The UKB is a large dataset! It might take some time to compute linear regression on the biggest chromosome.

Thanks a lot,it does help me out!

was.read('chr1_gwas.ht').export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')

The code below shoud be changed to:

hl.read_table('chr1_gwas.ht').export('chr1_gwas.tsv.bgz', header=True, delimiter='\t')
1 Like