GWAS on subset of UKBioBank

dmoracze · June 1, 2021, 5:08pm

@danking since I’m wanting to run 79,800 separate gwases, distributed across our cluster. Which approach do you think is more effeicient?

Merge the geno matrix table with the full pheno table, then write a script to run a gwas on a single annotation (eg a single column from the pheno table)

or

Write a script that subsets the pheno table for each iteration before merging pheno and geno? Then run the gwas.

Either way I’ll need a way to preserve the pheno column index, so I’ll likely write a script that takes a column index as an argument.

Topic		Replies	Views
Use-case for hail at our Institute Science	7	1062	June 20, 2019
How to run GWAS from UK Biobank efficiently on Hail Hail Query & hailctl	11	3306	December 21, 2020
Optimizing partitions and workers for UKBB analysis Hail Query & hailctl	3	516	August 29, 2019
Efficient GWAS analyses: expected time and resources Hail Query & hailctl	0	24	May 21, 2025
Table from pandas dataframe/aggregate problem Hail Query & hailctl	20	1487	January 23, 2020