Each time I run a Hail GWAS in All of Us with array data, I encounter issues when trying to save a checkpoint or when running logistic regression. I believe my code is concise, and considering the data size in the latest version (v7) of the array data, it should be manageable. However, the program consistently gets stuck, usually at (0 + 48) / 74 .
2023-10-03 16:24:19.862 Hail: INFO: logistic_regression_rows: running wald on 6529 samples for response variable y, with input variable x, and 5 additional covariates... [Stage 4:> (0 + 48) / 74]
The array data I used was:
# read in mircoarray data mt = hl.read_matrix_table('gs://fc-aou-datasets-controlled/v7/microarray/hail.mt/')
After annotating this matrix table with my phenotype/ancestry file, conducting some quick QC, and attempting to run logistic regression on each variant, the program gets stuck.
However, if I randomly select some variants across the genome to run the GWAS, the program runs smoothly and at a good speed.
# only use a subset of intervals mt = hl.filter_intervals(mt,[hl.parse_locus_interval(x,)for x in allintervals[1:4000] ])
2023-10-03 16:21:06.348 Hail: INFO: logistic_regression_rows: running wald on 6529 samples for response variable y, with input variable x, and 5 additional covariates... 2023-10-03 16:22:20.280 Hail: INFO: wrote table with 745 rows in 64 partitions to /tmp/persist_table4hGORAWhdh Total size: 45.17 KiB * Rows: 45.15 KiB * Globals: 11.00 B * Smallest partition: 1 rows (125.00 B) * Largest partition: 26 rows (1.47 KiB)
So, I assumed that the size of the matrix table might be the cause of the issue. However, how do I resolve this?