Hail 0.2 class not found exception on EMR

danking · July 13, 2018, 3:19pm

Could you share the hail log from one of these failing runs? I’d like to see what Spark thinks its copying and where it is putting the jars.

atebbe · July 13, 2018, 4:58pm

I don’t see how to attach anything other than an image here, so I dumped stdout and stderr from one of the task attempt logs on s3.

Stderr:
https://gfb-external-access.s3.amazonaws.com/stderr.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJABOW7QPSUAFJBZA/20180713/us-east-1/s3/aws4_request&X-Amz-Date=20180713T165750Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=8800faa5de50d3402077e4877513cbae38a40441adb178455aae1109b1c7d604

Stdout:
https://gfb-external-access.s3.amazonaws.com/stdout.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJABOW7QPSUAFJBZA/20180713/us-east-1/s3/aws4_request&X-Amz-Date=20180713T165815Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=f604a2571be9a8d2dddc1c6685f17e62c9dc799b1dae07f80b140ab6dadcbde5

atebbe · July 17, 2018, 1:09pm

We were able to get past this by manually distributing the jar to all of the cluster nodes and adding the absolute path to the jar to the classpath variables in spark-defaults.

tpoterba · July 17, 2018, 1:15pm

That’s pretty horrible. Maybe we should talk to the Spark people about this stuff?

tpoterba · July 17, 2018, 1:16pm

Things like addJar not being exposed in Python, not doing the right thing for s3 paths, etc.

cdeniz · August 17, 2018, 8:34pm

Hi Tim,

Is there a way to compile our own files with gradlew so we can get the latest Hail 0.2 version using Spark 2.3.0?

I used your .jar and .zip files (version ae9e34fb3cbf) and they do work on emr-5.13.0 with Spark 2.3.0; we just want to get Hail with the latest updates.

Thanks,

Carlos

tpoterba · August 18, 2018, 2:03pm

It’s totally possible to compile your own! I haven’t done it in a while (since making that .jar and .zip) so I could be wrong about specifics, but all you need to do is pass versions for Spark, Breeze, and py4j:

./gradlew -Dspark.version=2.3.0 -Dbreeze.version=0.13.2 -Dpy4j.version=0.10.6 shadowJar archiveZip

I just looked up the breeze / py4j versions for spark 2.3.0 so these should be correct.

tpoterba · August 18, 2018, 2:09pm

also note you’ll need to compile on the same OS that the EMR VMs are using

cdeniz · August 19, 2018, 2:46am

Thanks Tim. I’ll give it a shot

cdeniz · August 20, 2018, 5:52am

It worked, np. Thanks Tim!

Topic		Replies	Views
is.hail.kryo.HailKryoRegistrator ClassNotFoundException Help [0.1]	9	2111	May 4, 2018
Import_vcf() on databricks results in NoClassDefFoundError Help [0.1]	2	813	May 8, 2017
HAIL 0.1: export vcf hadoop error Help [0.1]	7	1372	January 28, 2019
TypeError: 'JavaPackage' object is not callable on AWS EMR when adding jars Hail Query & hailctl	1	577	March 30, 2021
Import_vcf() for tutorial data fails Help [0.1]	2	1304	May 3, 2017

Hail 0.2 class not found exception on EMR

Related topics