I am trying to run Hail on a Spark cluster on AWS (we don’t have access to GCS) and running into all kinds of issues (VEP annotations just fails without any error messages (other than this:
[Stage 0:==> (2438 + 24) / 54195]
[Stage 0:==> (2438 + 21) / 54195]
Command exiting with ret '1'
And since most people here seem to be running on GCS/Terra I am looking for kindred spirits to help each other debug and optimize on AWS EMR clusters…I got most of it working except for the VEP annotations of the full UKBB 200K dataset, but not sure if I just need bigger clusters etc.)
Ping me if you are interested in running Hail on EMR cluster on AWS!