I’m doing GWAS using UK Biobank data. Each chromosome has its own BGEN file. To reduce the amount of computation, I am only loading certain variants from each BGEN file, and I have a separate file of variants for each chromosome.
Would it be more computationally efficient to run each chromosome separately and then combine the results at the end, or should I try to load all the BGEN files at once?
When I run each chromosome separately, it seems that a few partitions always lag at the end of the computation, resulting in inefficient use of computational resources.
Can the “import_bgen” function handle multiple BGEN files and multiple variant lists at the same time (i.e. using the n-th variant list to select certain loci from the n-th BGEN file)? I could combine all the variants into one list if necessary.
And if importing all the BGEN files at once would work, would the enormous size of the resulting matrix table cause problems?