Best practices for UK Biobank Imputed Data

Hey @vforget!

Yes, QC, PRS, GWAS, etc. on. very large datasets are exactly the operations for which Hail was a designed.

Can you share the code you used? It sounds like you might be trying to copy all the genotypes into a new file. That is slow and expensive. Hail is designed to never modify or copy the genotype data. When you produce new variant or sample annotations, you should save them separately from the genotypes. In particular, if your data is already stored in a BGEN file, you should never use MatrixTable.write.