Hi Hail team,
We recently tried to filter a 400K sample VDS to 10K samples and export it to a hail MT. We hit problems at this scale in a Terra Jupyter notebook with a spark cluster. We ran 400 primary workers, 100 preemtiples and after 12 hrs, still couldn’t get it to finish. We also tried running it with only primary workers and still couldn’t get to go through. We ultimately did the conversion using a different route, so it’s not an immediate issue anymore, but we thought it might be helpful to share this experience for future development. We’ve used a similar process for smaller sample sizes, so we’re not sure where the scaling bottleneck is. Thanks for any advice or future improvements.
What size of workers did you use? I’m also trying to understand Hail scaling, and if we should use larger workers or more workers.