Spin up AWS EMR clusters with Hail


Hello Hail community,

The purpose of this thread is to share with you an AWS cloudformation tool we put together at Harvard Medical School (in collaboration with the Hail development team) to easily deploy EMR clusters with Hail installed: https://github.com/hms-dbmi/hail-on-AWS-spot-instances. One of the main features of the tool is that the EMR will use spot instances for the worker nodes, which translates into cost effective clusters. You can spin a cluster with your choice of EC2 types, subnet and security group configuration. Assuming you fulfill the pre-requisites explained in the repo and if your AWS account allows for EMR creation, then you can easily spin clusters with the latest version of Hail and JupyterNotebook installed in just in 3 simple steps:

  1. Cloning the repo

  2. Editing a .yaml configuration file

  3. Executing a shell script

The script takes care of all the setup and the necessary elements and adjustments to bring back to life either dropped or purposely added worker nodes when necessary (similar to what Google does with DataProc). Please feel free to take a look and play with the tool. Your feedback is more than welcome.