Hail on AWS EMR 6.0 (Scala 2.12)

Our team uses Hail 0.2 on AWS EMR for many analyses, and it’s been an excellent tool for us. AWS recently released EMR 6.0, and we’re considering upgrading our infrastructure. We’re able to build an AMI that contains Hail and launch a cluster - but we’re getting the following error when we call hl.read_matrix_table:

An error was encountered:
An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at is.hail.backend.spark.SparkBackend$.majorMinor$1(SparkBackend.scala:57)
at is.hail.backend.spark.SparkBackend$.checkSparkCompatibility(SparkBackend.scala:59)
at is.hail.backend.spark.SparkBackend$.createSparkConf(SparkBackend.scala:70)
at is.hail.backend.spark.SparkBackend$.configureAndCreateSparkContext(SparkBackend.scala:119)
at is.hail.backend.spark.SparkBackend$.apply(SparkBackend.scala:195)
at is.hail.backend.spark.SparkBackend.apply(SparkBackend.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File “”, line 2, in read_matrix_table
File “/usr/local/lib/python3.7/site-packages/hail/typecheck/check.py”, line 614, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/hail/methods/impex.py”, line 1936, in read_matrix_table
for rg_config in Env.backend().load_references_from_dataset(path):
File “/usr/local/lib/python3.7/site-packages/hail/utils/java.py”, line 58, in backend
return Env.hc()._backend
File “/usr/local/lib/python3.7/site-packages/hail/utils/java.py”, line 46, in hc
init()
File “”, line 2, in init
File “/usr/local/lib/python3.7/site-packages/hail/typecheck/check.py”, line 614, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/hail/context.py”, line 228, in init
skip_logging_configuration, optimizer_iterations)
File “/usr/local/lib/python3.7/site-packages/hail/backend/spark_backend.py”, line 193, in init
jsc, app_name, master, local, True, min_block_size, tmpdir, local_tmpdir)
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File “/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py”, line 63, in deco
return f(*a, **kw)
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
format(target_id, “.”, name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at is.hail.backend.spark.SparkBackend$.majorMinor$1(SparkBackend.scala:57)
at is.hail.backend.spark.SparkBackend$.checkSparkCompatibility(SparkBackend.scala:59)
at is.hail.backend.spark.SparkBackend$.createSparkConf(SparkBackend.scala:70)
at is.hail.backend.spark.SparkBackend$.configureAndCreateSparkContext(SparkBackend.scala:119)
at is.hail.backend.spark.SparkBackend$.apply(SparkBackend.scala:195)
at is.hail.backend.spark.SparkBackend.apply(SparkBackend.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

I believe the issue is that Hail is built with Scala 2.11.x, and EMR 6.0 uses Scala 2.12.x. Can Hail be built with Scala 2.12? Has anyone else experienced similar issues?

This is on the roadmap, but not in the next month, I think. It’s probably not terribly hard to switch the default compilation to 2.12 and fix the errors that arise, but making things compatible with both Scala versions may be harder. We’ll probably make this change when Google Dataproc makes a GA release with Scala 2.12 / Spark 3.

Thanks for the quick response!