Hail no longer supports Spark 2.?

I tried to compile HAIL from source but seeing this message when trying to compile
$ make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.7

Hail no longer supports Spark 2.

Surely that is not true (yet?)…

Indeed I tried it with spark 3.1.1 and i did not get the message, but it’s only version 0.2.64…This big a change surely would be something for 0.3?

Looked at the gradle build bill and indeed I see this:

project.ext {
    cachedBreezeVersion = null

    sparkVersion = System.getProperty("spark.version", "3.1.1")
    if (sparkVersion.startsWith("2.")) {
        throw new UnsupportedOperationException("Hail no longer supports Spark 2.")
    }
    else if (sparkVersion != "3.1.1") {
        project.logger.lifecycle("WARNING: Hail only tested with Spark 3.1.1, use other versions at your own risk.")
    }
    scalaVersion = System.getProperty("scala.version", "2.12.13")
    if (!scalaVersion.startsWith("2.12.")) {
        throw new UnsupportedOperationException("Hail currently only supports Scala 2.12")
    }
    scalaMajorVersion = (scalaVersion =~ /^\d+.\d+/)[0]
}

Pretty radical change in a point release, no?

Which version is still supported on 2.4.* for spark?
I want to use AWS EMR and they still run on spark 2.4.0

The latest release of hail (0.2.64) still supports Spark 2, but currently our plan is that the next one won’t. We change the version number right before we release, so if you just build from github it’ll show whatever version number was last released still.

Spark 2 was keeping us from upgrading python (pyspark only supports up to python 3.7), java (Spark 2 locks in at java 8), and Scala (Spark 2 only supports Scala 2.11).

It seems like EMR supports Spark 3.0.1, didn’t realize they don’t support 3.1.1 yet. Would that be enough to unblock you?

I did not realize I was still using an older EMR…Trying 6.2.0 now with Spark 3.0.1 so should be good now…Let you know if not…

Yeah, as i feared, trying to compile with spark 3.0.1 gets compile errors…Mostly the same:

10 times this one:

/home/hadoop/hail/hail/src/main/scala/is/hail/expr/ir/AbstractMatrixTableSpec.scala:24: too many arguments (2) for method apply: (hints: List[Class[_]])org.json4s.ShortTypeHints in object ShortTypeHints
Note that 'typeHintFieldName' is not a parameter name of the invoked method.
      classOf[RelationalSpec], classOf[MatrixTableSpec], classOf[TableSpec]), typeHintFieldName="name")

and these two:

/home/hadoop/hail/hail/src/main/scala/is/hail/expr/ir/functions/RelationalFunctions.scala:137: No org.json4s.Formats found. Try to bring an instance of org.json4s.Formats in scope or use the org.json4s.DefaultFormats.
    (jv \ "name").extract[String] match {
                         ^
/home/hadoop/hail/hail/src/main/scala/is/hail/expr/ir/functions/RelationalFunctions.scala:141: No org.json4s.Formats found. Try to bring an instance of org.json4s.Formats in scope or use the org.json4s.DefaultFormats.
        jv.extract[T]
                  ^

Guess I can try to compile for 3.1.1 but not sure how that works with the already present spark 3.0.1 for AWS EMR and want to make sure I can use the S3 FS etc. and that did not seem to work if I compile for 3.1.1…

Will talk more on Monday and find a solution

I agree that compiling for 3.1.1 when Amazon doesn’t support it will cause problems, don’t do that. I’m working on addressing the compilation issue with 3.0.1. In the meantime, you can continue to use hail 0.2.64 as provided on pip with Spark 2.

I was able to compile 0.2.63 with spark 3.0.1 and Scala 2.12.8 so I am good for now, but would be great to see if I can utilize the most recent version

0.2.64’s release (tagged here: Release 0.2.64 · hail-is/hail · GitHub) should support the same things as 0.2.63 does. The current main branch on github is unreleased WIP for eventual 0.2.65.

1 Like

@thondeboer This PR should fix building Hail with Spark 3.0.1, as well as for Spark 2. [query] Support older versions of Spark by johnc1231 · Pull Request #10254 · hail-is/hail · GitHub

1 Like

@thondeboer The current hail main branch should work for Spark 2, Spark 3.0.1, and Spark 3.1.1, which should allow you to use both of the EMR versions you tried.