java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST

Hi,

I’m trying to get Spark 0.2 to run, but it’s failing and I don’t know what to do. Below are the commands and error message.

Python 3.6.5 (default, May 3 2018, 10:08:28)
[GCC 5.4.0 20160609] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import hail as hl
mt = hl.balding_nichols_model(3, 100, 100)
Initializing Spark and Hail with default parameters…
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/06/19 15:03:11 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.api.python.PythonGatewayServer$$anonfun$main$1.apply$mcV$sp(PythonGatewayServer.scala:50)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1262)
at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:37)
at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 546, in wrapper
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 483, in check_all
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 56, in check
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 282, in check
File “/usr/local/hail/build/distributions/hail-python.zip/hail/genetics/reference_genome.py”, line 8, in
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 211, in get_reference
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 183, in default_reference
File “/usr/local/hail/build/distributions/hail-python.zip/hail/utils/java.py”, line 59, in hc
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 547, in wrapper
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 160, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 547, in wrapper
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 35, in init
File “/usr/local/lib/python3.6/dist-packages/pyspark/context.py”, line 292, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File “/usr/local/lib/python3.6/dist-packages/pyspark/java_gateway.py”, line 93, in launch_gateway
raise Exception(“Java gateway process exited before sending its port number”)
Exception: Java gateway process exited before sending its port number

Any idea?

Cheers,

Steph

This is a PySpark issue, not a Hail issue. The results on Google look pretty sparse, which is worrying.

How did you start pyspark / python? Are you on a cluster or a laptop?

Sorry for the delay, I didn’t get an email alert when @tpoterba answered.

I’m on a cluster, and simply started python using “python3”, maybe that’s the issue? Should I do something else before?

What kind of cluster? Is this a Google Dataproc spark cluster?

Usually, if you’re using Spark, you start an interactive session with pyspark not python3.

How did you get hail? Are you using the hail distribution or are you compiling on the server or are you using the hail JAR and ZIP found in our public Google Storage? If you haven’t set the necessary environment variables, hail won’t work, even if you resolve this spark issue.

In particular, take a look at the Getting Started page for pyspark arguments and environment variables necessary for Hail to start correctly.

If you’re using a Google Dataproc cluster, I strongly recommend using cloudtools to start, stop, and submit jobs to Dataproc clusters.

So, I have a Ubuntu instance with 22 cores, and on which I have full admin rights. I have cleared up everything and just made a brand new instance.

I installed Spark 2.2.0, Anaconda (Python 3.6), Scala, and downloaded the pre-compiled Hail for Spark 2.2.0
This is my .bashrc:

# added by Anaconda3 installer
export PATH="/usr/local/anaconda3/bin:$PATH"

export SPARK_HOME=/usr/local/spark
export HAIL_HOME=/usr/local/hail
export PATH=$PATH:$HAIL_HOME/bin/
export PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}$HAIL_HOME/build/distributions/hail-python.zip"
export PYTHONPATH="$PYTHONPATH:$SPARK_HOME/python"
export PYTHONPATH="$PYTHONPATH:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip"
## PYSPARK_SUBMIT_ARGS is used by ipython and jupyter
export PYSPARK_SUBMIT_ARGS="\
  --conf spark.driver.extraClassPath=\"$HAIL_HOME/build/libs/hail-all-spark.jar\" \
  --conf spark.executor.extraClassPath=./hail-all-spark.jar \
  pyspark-shell"

I’ve tried ipython:

$ ipython
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import hail as hl
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-90be69de01e3> in <module>()
----> 1 import hail as hl

ModuleNotFoundError: No module named 'hail'

and pyspark:

$ pyspark \
>   --conf spark.driver.extraClassPath=$HAIL_HOME/build/libs/hail-all-spark.jar \
>   --conf spark.executor.extraClassPath=./hail-all-spark.jar \
>   --conf spark.sql.files.openCostInBytes=1099511627776 \
>   --conf spark.sql.files.maxPartitionBytes=1099511627776 \
>   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>   --conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/07/05 15:14:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/07/05 15:14:42 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:236)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
  File "/usr/local/spark/python/pyspark/shell.py", line 45, in <module>
    spark = SparkSession.builder\
  File "/usr/local/spark/python/pyspark/sql/session.py", line 169, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/usr/local/spark/python/pyspark/context.py", line 334, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/usr/local/spark/python/pyspark/context.py", line 180, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/local/spark/python/pyspark/context.py", line 273, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.ExceptionInInitializerError
        at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:546)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:373)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: deloukas-haildevel: deloukas-haildevel: Name or service not known
        at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
        at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:891)
        at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:884)
        at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:884)
        at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941)
        at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.localHostName(Utils.scala:941)
        at org.apache.spark.internal.config.package$.<init>(package.scala:204)
        at org.apache.spark.internal.config.package$.<clinit>(package.scala)
        ... 14 more
Caused by: java.net.UnknownHostException: deloukas-haildevel: Name or service not known
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
        at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
        ... 23 more


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/spark/python/pyspark/shell.py", line 54, in <module>
    spark = SparkSession.builder.getOrCreate()
  File "/usr/local/spark/python/pyspark/sql/session.py", line 169, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/usr/local/spark/python/pyspark/context.py", line 334, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/usr/local/spark/python/pyspark/context.py", line 180, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/local/spark/python/pyspark/context.py", line 273, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.internal.config.package$
        at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:546)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:373)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)

>>>

I have no clue about what is happening, why, or what to do; everything had smoothly when I had installed Hail 0.1

Any idea?

The root issue of the first problem is that ipython cannot find the hail zip. When you say the “pre-compiled Hail for Spark 2.2.0” do you mean “Current distribution for Spark 2.2.0” whose URL looks like https://storage.googleapis.com/hail-common/distributions/devel/Hail-devel-1a29de719de9-Spark-2.2.0.zip (the hash after “devel-” might be different) listed under the heading “Running Hail locally with a pre-compiled distribution”? For the “distribution” version of hail, you need to use the instructions in that section of the getting started. The environment variables you shared above are for running Hail on a custom Spark cluster and require building Hail from source.

If you have the hail distribution, this will start an ipython session:

export SPARK_HOME=/path/to/spark
unzip /path/to/Hail-devel-SOME_HASH-Spark-2.2.0.zip
export HAIL_HOME=/path/to/unzipped/hail
export PATH=$PATH:$HAIL_HOME/bin/
conda env create -n hail -f $HAIL_HOME/python/hail/environment.yml

source activate hail
ihail

The root issue of the second problem is this:

Caused by: java.net.UnknownHostException: deloukas-haildevel: Name or service not known

Your machine has a hostname that does not resolve to an IP address. If deloukas-haildevel is the hostname of your Ubuntu instance, then modify /etc/hosts to contain:

127.0.0.1 deloukas-haildevel

Moreover, this will not work if you have the pre-built hail distribution, because there is no jar at the path $HAIL_HOME/build/libs/hail-all-spark.jar.

1 Like

@danking,

thank you so much! I compiled from source and updated /etc/hosts and now it works! :smiley:

Thanks again for your help,

Steph

1 Like