Could not able to export the data to ElasticSearch


#1

Hi,

I am running Spark 2.2.0 and Hail 0.2. I have converted the vcf to matrix table and on to table. I am now trying to export that data on to ElasticSearch. I am getting an error like.
(Note: Both EMR and Elastic Search are hosted in AWS)
>>> mt=l.export_elasticsearch(ht,host=‘https://xxxxxxx.us-east-1.es.amazonaws.com’,port=80,index=‘singlevcf’,index_type=‘variant’,block_size=10000,config=None,verbose=True)
Config Map(es.nodes -> https://xxxxxxx.us-east-1.es.amazonaws.com, es.port -> 80, es.batch.size.entries -> 10000, es.index.auto.create -> true)
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in export_elasticsearch
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 560, in wrapper
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/methods/impex.py”, line 2052, in export_elasticsearch
File “/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/utils/java.py”, line 224, in deco
hail.utils.java.FatalError: SSLException: Unrecognized SSL message, plaintext connection?

Java stack trace:
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:327)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:97)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:83)
	at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:49)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:47)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:21)
	at is.hail.io.ElasticsearchConnector.export(ElasticsearchConnector.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:124)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:344)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:348)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:158)
	at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:574)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:320)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:97)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:83)
	at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:49)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:47)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:21)
	at is.hail.io.ElasticsearchConnector.export(ElasticsearchConnector.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
	at sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:710)
	at sun.security.ssl.InputRecord.read(InputRecord.java:527)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
	at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:757)
	at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:828)
	at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2116)
	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:478)
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:112)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:344)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:348)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:158)
	at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:574)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:320)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:97)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:83)
	at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:49)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:47)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:21)
	at is.hail.io.ElasticsearchConnector.export(ElasticsearchConnector.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)





Hail version: 0.2.7-e08cc2a17c4a
Error summary: SSLException: Unrecognized SSL message, plaintext connection?

I am able to curl the ES url from the cluster though.


#2

Are you certain that your elastic search server is running TLS/SSL on port 80? Usually port 80 is reserved for insecure HTTP, not secure TLS/SSL HTTPS. Can you try with port=443?


#3

Tried that, getting a bad request error. Tried previously too. Thank you!

>>> t=l.export_elasticsearch(ht,host='xxxxxxxxxxxxx.us-east-1.es.amazonaws.com',port=443,index='singlevcf',index_type='variant',block_size=1000,config=None,verbose=True)
Config Map(es.nodes -> xxxxxxxxxxxxx.us-east-1.es.amazonaws.com, es.port -> 443, es.batch.size.entries -> 1000, es.index.auto.create -> true)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-1000>", line 2, in export_elasticsearch
  File "/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py", line 560, in wrapper
  File "/opt/hail/hail/build/distributions/hail-python.zip/hail/methods/impex.py", line 2052, in export_elasticsearch
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/opt/hail/hail/build/distributions/hail-python.zip/hail/utils/java.py", line 224, in deco
hail.utils.java.FatalError: EsHadoopInvalidRequest: [GET] on [] failed; server[18.209.185.175:443] returned [400|Bad Request:]

Java stack trace:
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:327)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:97)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:83)
	at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:49)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:47)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:21)
	at is.hail.io.ElasticsearchConnector.export(ElasticsearchConnector.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [] failed; server[18.209.185.175:443] returned [400|Bad Request:]
	at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:424)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:382)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:344)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:348)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:158)
	at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:574)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:320)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:97)
	at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:83)
	at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:49)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:47)
	at is.hail.io.ElasticsearchConnector$.export(ElasticsearchConnector.scala:21)
	at is.hail.io.ElasticsearchConnector.export(ElasticsearchConnector.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)




Hail version: 0.2.7-97fb2a5dd4a1
Error summary: EsHadoopInvalidRequest: [GET] on [] failed; server[18.209.185.175:443] returned [400|Bad Request:]
>>>

#4

According to the ElasticSearch error, you might try setting es.nodes.wan.only. You can pass extra configuration arguments to hl.export_elasticsearch with the config named argument.

If you SSH to an EMR worker node and try to connect to elastic search directly without hail does it work?


#5

Could you please post me the command, as when I try to change the config parameter. I am getting this error

>>> mt=l.export_elasticsearch(ht,host=‘https://xxxxxxxxxxxxx.us-east-1.es.amazonaws.com’,port=443,index=‘singlevcf’,index_type=‘variant’,block_size=1000,config=dict(‘es.nodes.wan.only’,‘true’),verbose=True)

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

TypeError: dict expected at most 1 arguments, got 2


#6

dict(‘es.nodes.wan.only’,‘true’) is a python error. This should be

{'es.nodes.wan.only': 'true'}

#7

I am getting this

mt=l.export_elasticsearch(ht,host=‘https://xxxxxxxxxxxxxx.us-east-1.es.amazonaws.com’,port=443,index=‘singlevcf’,index_type=‘variant’,block_size=1000,config={‘es.nodes.wan.only’: ‘true’},verbose=True)
File “”, line 1
mt=l.export_elasticsearch(ht,host=‘xxxxxxxxxxxxxx.us-east-1.es.amazonaws.com’,port=443,index=‘singlevcf’,index_type=‘variant’,block_size=1000,config={‘es.nodes.wan.only’: ‘true’},verbose=True)
^
SyntaxError: invalid character in identifier


#8

what character is it pointing to?


#9

It is pointing on s.


#10

can you try retyping the script? possibly it’s a weird unicode s copy/pasted from another font


#11

@nara, discourse, our forum software, converts matching quotes into curly quotes “like these” if the text is not code. When you copy source code and error messages into discourse, you should use a pair of three backticks (`), like this:

>>> mt=l.export_elasticsearch(ht,host='https://xxxxxxxxxxxxx.us-east-1.es.amazonaws.com',port=443,index='singlevcf',index_type='variant',block_size=1000,config=dict('es.nodes.wan.only','true'),verbose=True)

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

TypeError: dict expected at most 1 arguments, got 2

That avoids the automatic conversion to curly quotes (note that the quote before https is a straight up and down quote ', not a curly quote . You can learn more about this format syntax here.


Tim copied your code and modified it without fixing the quotes. I’ve edited his post to use the non-curly quotes. Try this instead:

mt=l.export_elasticsearch(ht,
                          host='https://xxxxxxxxxxxxx.us-east-1.es.amazonaws.com',
                          port=443,
                          index='singlevcf',
                          index_type='variant',
                          block_size=1000,
                          config={'es.nodes.wan.only': 'true'},
                          verbose=True)

The python syntax for a dictionary / key-value-mapping, is {key1: value1, key2: value2}. The dict function is used to convert things that are not dictionaries into dictionaries, for example: you can convert a list of pairs to a dictionary like this: dict([(key1, value1), (key2, value2)]).


#12

Thanks for that @danking and @tpoterba. But I am running into this, I have given both the syntaxes

mt=l.export_elasticsearch(ht,
… host=‘https://xxxxxxxxxxxxxx.us-east-1.es.amazonaws.com’,
… port=443,
… index=‘singlevcf’,
… index_type=‘variant’,
… block_size=1000,
… config={‘es.nodes.wan.only’,‘true’},
… verbose=True)
Traceback (most recent call last):
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 487, in check_all
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 59, in check
hail.typecheck.check.TypecheckFailure

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “”, line 8, in
File “”, line 2, in export_elasticsearch
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 559, in wrapper
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 513, in check_all
TypeError: export_elasticsearch: parameter ‘config’: expected (None or dict[str, str]), found set: {‘true’, ‘es.nodes.wan.only’}

mt=l.export_elasticsearch(ht,
… host=‘https://xxxxxxxxxxxxxx.us-east-1.es.amazonaws.com’,
… port=443,
… index=‘singlevcf’,
… index_type=‘variant’,
… block_size=1000,
… config={‘true’, ‘es.nodes.wan.only’},
… verbose=True)
Traceback (most recent call last):
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 487, in check_all
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 59, in check
hail.typecheck.check.TypecheckFailure

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “”, line 8, in
File “”, line 2, in export_elasticsearch
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 559, in wrapper
File “/opt/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 513, in check_all
TypeError: export_elasticsearch: parameter ‘config’: expected (None or dict[str, str]), found set: {‘true’, ‘es.nodes.wan.only’}


#13

Hey thanks for the support guys. Finally this has worked,

>>> mt=l.export_elasticsearch(ht,host=‘https://xxxxxxxxxxxx.us-east-1.es.amazonaws.com’,port=443,index=‘singlevcf’,index_type=‘variant’,block_size=1000,config={‘es.nodes.wan.only’:‘true’},verbose=True)

Config Map(es.nodes.wan.only -> true, es.batch.size.entries -> 1000, es.index.auto.create -> true, es.port -> 443, es.nodes -> https://xxxxxxxxxxxxxxx.us-east-1.es.amazonaws.com)

[Stage 3:========================================================>(90 + 1) / 91]>>>


#14

Ah, my bad. I fixed the bug that you found in my post.