Hl.export_elasticsearch conflict with scala2.12

Hi.

I am installing Hail v0.2.60 on spark 3.0.0 / scala 2.12.10

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.10 SPARK_VERSION=3.0.0

I am able to load, process and write data without issues.

The problem rise when I wish to export my table on Elasticsearch. Here I got a error that seems like an incompatibility with scala 2.12.

# Load table
ht_res = hl.read_table('s3://[...].ht')

hl.export_elasticsearch(
    ht_res,
    "[es-URL]",
    [es-port],
    '[index]',
    'documents',
    100,
    config={
        'es.nodes.wan.only':'true',
        'es.batch.write.retry.wait':'60s',
        'es.batch.write.retry.count':'30'
    },
    verbose=True
)

Hail version: 0.2.60-de1845e1c2f6
Error summary: NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

In the mean time, I am able to install Hail v0.2.60 on spark 2.4.6 / scala 2.11.12

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.6

In this context hl.export_elasticsearch() code above run without issue

You’re right, this is a mistake. We hard code the path to the elasticsearch dependency to use spark 2 and scala 2.11.

I made a github issue here: https://github.com/hail-is/hail/issues/9767

and assigned it to myself. Will try to address in next few days.

I’m actually not sure how to do this. We use the library here, which is explicitly for spark 2.x:

I’ve done some quick googling and didn’t immediately find anyone doing this with Spark 3, I’ll have to keep looking.

There’s an open issue for Spark 3 / Scala 2.12 support in the elasticsearch-hadoop connnector.

Thanks @nawatts. So I think the answer then is that there’s not currently a way to export to elasticsearch from Spark 3, and until there is Hail likely won’t support it.

Also, https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html lists (in the Apache Spark section) supported Spark / Scala versions and their corresponding ES-Hadoop artifact ID.