HailContext(sc) error

[6]: hc = hail.HailContext()

Py4JJavaError Traceback (most recent call last)
in ()
----> 1 hc = hail.HailContext()

/home/eila/hail/python/hail/context.pyc in init(self, sc, appName, master, local, log, quiet, append, parquet_compression, min_block_size, branching_factor, tmp_dir)
69 self._jhc = scala_object(self._hail, ‘HailContext’).apply(
70 jsc, appName, joption(master), local, log, quiet, append,
—> 71 parquet_compression, min_block_size, branching_factor, tmp_dir)
73 self._jsc = self._jhc.sc()

/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in call(self, *args)
1131 answer = self.gateway_client.send_command(command)
1132 return_value = get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name)
1135 for temp_arg in temp_args:

/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 “An error occurred while calling {0}{1}{2}.\n”.
–> 319 format(target_id, “.”, name), value)
320 else:
321 raise Py4JError(

Py4JJavaError: An error occurred while calling o3.apply.
: org.apache.spark.SparkException: Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
at org.apache.spark.SparkConf$$anonfun$validateSettings$7$$anonfun$apply$8.apply(SparkConf.scala:543)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7$$anonfun$apply$8.apply(SparkConf.scala:541)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7.apply(SparkConf.scala:541)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7.apply(SparkConf.scala:529)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:529)
at org.apache.spark.SparkContext.(SparkContext.scala:365)
at is.hail.HailContext$.configureAndCreateSparkContext(HailContext.scala:86)
at is.hail.HailContext$.apply(HailContext.scala:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

In [7]:


sorry that I didn’t continue the older thread -I received a server error in numerous times.

Anyway, I tried to regenerate env. with a larger machine resources. everything went smooth + I was able to install the anaconda env. this time.

the only issue that i encounter was with ipython installation. it was looking for python.h and to resolve that, I install the python-dev version.

when i run the test code on ssh to the master machine:

import hail => no issues
hc = hail.HailContext(sc) => fires error from Py4JJavaError (see in the above post)

Could you please let me know what might it be.


What was your dataproc submission command? I haven’t seen this error before on GCP.

i wanted to test the installation. Is this the right way to do so, or should i run submit command using gcloud?

Definitely test using gcloud dataproc jobs submit pyspark. I think dataproc probably sets up some Spark config automatically, which makes the Hail ‘getting started’ setup unnecessary (and looks like it causes conflicts!).

Generally, the only time we SSH into dataproc VMs is when we need to get local log files. Nearly everything goes through gcloud.

excellent. tested and working

How do you shut down a cluster? by shutting down the VMs?
The online suggestion was to delete them. do you know of any other option?


You can shut down the cluster using the UI or the command line tool: https://cloud.google.com/dataproc/docs/guides/manage-cluster

Shutting down the VMs one by one sounds incredibly painful, don’t do that!

Hi! You can do it with the GUI or on the command line. With regard to general questions about Google Cloud Platform, I’d suggest checking out their extensive docs: