Anyone also trying to run Hail on AWS EMR clusters and having issues? Let's huddle

I am trying to run Hail on a Spark cluster on AWS (we don’t have access to GCS) and running into all kinds of issues (VEP annotations just fails without any error messages (other than this:

[Stage 0:==>                                                (2438 + 24) / 54195]

[Stage 0:==>                                                (2438 + 21) / 54195]
Command exiting with ret '1'

And since most people here seem to be running on GCS/Terra I am looking for kindred spirits to help each other debug and optimize on AWS EMR clusters…I got most of it working except for the VEP annotations of the full UKBB 200K dataset, but not sure if I just need bigger clusters etc.)

Ping me if you are interested in running Hail on EMR cluster on AWS!

Thon

Hi,

We were doing a similar analysis. However, we were facing a different error.

1 Like

Yes, i ran into this as well…I set it to 10,000 but you are saying that it needs only slightly raised?

and you are talking about “fs.s3.maxConnections” or a different one?

Here is my current spark config setting…

    def _get_spark_conf(self):
        '''The spark configurations needed for HAIL, GLOW and DELTA LAKE on SPARK'''
        s = [
                {
                    "Classification": "spark-defaults",
                    "Properties": {
                        "spark.jars": "/usr/local/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar",
                        "spark.kryo.registrator": "is.hail.kryo.HailKryoRegistrator",
                        "spark.serializer" : "org.apache.spark.serializer.KryoSerializer"
                    }
                },
                {
                    "Classification":"spark",
                    "Properties":{
                        "maximizeResourceAllocation":"true"
                    }
                },
                {
                    "Classification":"yarn-site",
                    "Properties":{
                        "yarn.nodemanager.vmem-check-enabled":"false"
                    }
                },
                {
                    "Classification": "emrfs-site",
                    "Properties": {
                        "fs.s3.maxConnections": "10000",
                    }
                }
            ]
        return s

Show me yours! :slight_smile:

Thon

Thanks for the code snippet. It helps to understand better.

Yes, I am talking about fs.s3.maxConnections. We had to set it up at: 1.2*Number of cores.

I can’t share the spark configurations here as they are in our client environment.

1 Like