Hl.export_elasticsearch conflict with scala2.12

Hi.

I am installing Hail v0.2.60 on spark 3.0.0 / scala 2.12.10

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.10 SPARK_VERSION=3.0.0

I am able to load, process and write data without issues.

The problem rise when I wish to export my table on Elasticsearch. Here I got a error that seems like an incompatibility with scala 2.12.

# Load table
ht_res = hl.read_table('s3://[...].ht')

hl.export_elasticsearch(
    ht_res,
    "[es-URL]",
    [es-port],
    '[index]',
    'documents',
    100,
    config={
        'es.nodes.wan.only':'true',
        'es.batch.write.retry.wait':'60s',
        'es.batch.write.retry.count':'30'
    },
    verbose=True
)

Hail version: 0.2.60-de1845e1c2f6
Error summary: NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

In the mean time, I am able to install Hail v0.2.60 on spark 2.4.6 / scala 2.11.12

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.6

In this context hl.export_elasticsearch() code above run without issue

You’re right, this is a mistake. We hard code the path to the elasticsearch dependency to use spark 2 and scala 2.11.

I made a github issue here: https://github.com/hail-is/hail/issues/9767

and assigned it to myself. Will try to address in next few days.

1 Like

I’m actually not sure how to do this. We use the library here, which is explicitly for spark 2.x:

I’ve done some quick googling and didn’t immediately find anyone doing this with Spark 3, I’ll have to keep looking.

There’s an open issue for Spark 3 / Scala 2.12 support in the elasticsearch-hadoop connnector.

Thanks @nawatts. So I think the answer then is that there’s not currently a way to export to elasticsearch from Spark 3, and until there is Hail likely won’t support it.

Also, https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html lists (in the Apache Spark section) supported Spark / Scala versions and their corresponding ES-Hadoop artifact ID.