Hi.
I am installing Hail v0.2.60 on spark 3.0.0 / scala 2.12.10
sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.10 SPARK_VERSION=3.0.0
I am able to load, process and write data without issues.
The problem rise when I wish to export my table on Elasticsearch. Here I got a error that seems like an incompatibility with scala 2.12.
# Load table
ht_res = hl.read_table('s3://[...].ht')
hl.export_elasticsearch(
ht_res,
"[es-URL]",
[es-port],
'[index]',
'documents',
100,
config={
'es.nodes.wan.only':'true',
'es.batch.write.retry.wait':'60s',
'es.batch.write.retry.count':'30'
},
verbose=True
)
Hail version: 0.2.60-de1845e1c2f6
Error summary: NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
In the mean time, I am able to install Hail v0.2.60 on spark 2.4.6 / scala 2.11.12
sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.6
In this context hl.export_elasticsearch()
code above run without issue
You’re right, this is a mistake. We hard code the path to the elasticsearch dependency to use spark 2 and scala 2.11.
I made a github issue here: https://github.com/hail-is/hail/issues/9767
and assigned it to myself. Will try to address in next few days.
1 Like
I’m actually not sure how to do this. We use the library here, which is explicitly for spark 2.x:
I’ve done some quick googling and didn’t immediately find anyone doing this with Spark 3, I’ll have to keep looking.
There’s an open issue for Spark 3 / Scala 2.12 support in the elasticsearch-hadoop connnector.
opened 09:18AM - 09 Jan 20 UTC
Feature description
Spark3 is currently in RC. Will there be support for Spark3 in the next release version (v8) or will we...
:Spark
enhancement
Thanks @nawatts . So I think the answer then is that there’s not currently a way to export to elasticsearch from Spark 3, and until there is Hail likely won’t support it.
Also, https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html lists (in the Apache Spark section) supported Spark / Scala versions and their corresponding ES-Hadoop artifact ID.