I’m trying to run a GWAS (logistic regression) for about 369,000 subjects using UKBB data on Google Cloud. After filtering out SNP’s based on info score and minor allele frequency, I think I’ll have about 1 million SNP’s per chromosome.
When I create my cluster, how should I determine the most cost effective number of pre-emptible workers? And when I use the import_bgen function to import the BGEN file, how many partitions would be optimal?
Any tips would be greatly appreciated. Thanks!