Hl.export_elasticsearch conflict with scala2.12

mhebrard · December 2, 2020, 5:23am

Hi.

I am installing Hail v0.2.60 on spark 3.0.0 / scala 2.12.10

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.10 SPARK_VERSION=3.0.0

I am able to load, process and write data without issues.

The problem rise when I wish to export my table on Elasticsearch. Here I got a error that seems like an incompatibility with scala 2.12.

# Load table
ht_res = hl.read_table('s3://[...].ht')

hl.export_elasticsearch(
    ht_res,
    "[es-URL]",
    [es-port],
    '[index]',
    'documents',
    100,
    config={
        'es.nodes.wan.only':'true',
        'es.batch.write.retry.wait':'60s',
        'es.batch.write.retry.count':'30'
    },
    verbose=True
)

Hail version: 0.2.60-de1845e1c2f6
Error summary: NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

mhebrard · December 2, 2020, 10:02am

In the mean time, I am able to install Hail v0.2.60 on spark 2.4.6 / scala 2.11.12

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.6

In this context hl.export_elasticsearch() code above run without issue

johnc1231 · December 2, 2020, 2:10pm

You’re right, this is a mistake. We hard code the path to the elasticsearch dependency to use spark 2 and scala 2.11.

I made a github issue here: https://github.com/hail-is/hail/issues/9767

and assigned it to myself. Will try to address in next few days.

johnc1231 · December 8, 2020, 4:17pm

I’m actually not sure how to do this. We use the library here, which is explicitly for spark 2.x:

I’ve done some quick googling and didn’t immediately find anyone doing this with Spark 3, I’ll have to keep looking.

nawatts · December 8, 2020, 7:48pm

There’s an open issue for Spark 3 / Scala 2.12 support in the elasticsearch-hadoop connnector.

johnc1231 · December 8, 2020, 7:55pm

Thanks @nawatts. So I think the answer then is that there’s not currently a way to export to elasticsearch from Spark 3, and until there is Hail likely won’t support it.

nawatts · December 8, 2020, 8:08pm

Also, https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html lists (in the Apache Spark section) supported Spark / Scala versions and their corresponding ES-Hadoop artifact ID.

Topic		Replies	Views
Export ElasticSearch error Hail Query & hailctl	4	567	November 4, 2020
Bug Hail 1 to Elastic Search Help [0.1]	1	1191	October 5, 2018
Scala version incompatible between hail and spark? Development	6	640	April 1, 2022
Hail on AWS EMR 6.0 (Scala 2.12) Hail Query & hailctl	2	827	July 21, 2020
Running Hail on AWS Help [0.1]	29	3751	January 9, 2019

Hl.export_elasticsearch conflict with scala2.12

Related topics