Hail Py4JError while calling z:is.hail.backend.spark.SparkBackend.apply

These are the following commands I ran in a ipython shell:-
import hail as hl
mt = hl.balding_nichols_model(n_populations=3,
n_samples=10,
n_variants=100)
mt.show()

I am getting an error “Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.” while running the second line. I’m very new to this (just started, so do let me know should you require any additional information!)

Thanks!

A few questions:

  1. Where are you running hail? Just locally on a laptop, or on a cluster/cloud platform of some sort?
  2. What operating system are you using are using?
  3. What’s the full error message you’re seeing (you can upload a text file with it if it’s very big)?
  4. How did you install hail? Pip?

Hi, thank you for your help!

  1. Yes, I am running it locally on a laptop
  2. I am using Mac OSX
  3. I think this is the full error message I am seeing:
Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.

: java.lang.NoSuchMethodError: org.slf4j.helpers.MessageFormatter.arrayFormat(Ljava/lang/String;[Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;

at org.sparkproject.jetty.util.log.JettyAwareLogger.log(JettyAwareLogger.java:624)

at org.sparkproject.jetty.util.log.JettyAwareLogger.info(JettyAwareLogger.java:314)

at org.sparkproject.jetty.util.log.Slf4jLog.info(Slf4jLog.java:77)

at org.sparkproject.jetty.util.log.Log.initialized(Log.java:169)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:276)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:265)

at org.sparkproject.jetty.util.component.AbstractLifeCycle.<clinit>(AbstractLifeCycle.java:36)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:117)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:104)

at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:89)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1(WebUI.scala:70)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1$adapted(WebUI.scala:70)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:70)

at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:60)

at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)

at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:478)

at is.hail.backend.spark.SparkBackend$.configureAndCreateSparkContext(SparkBackend.scala:146)

at is.hail.backend.spark.SparkBackend$.apply(SparkBackend.scala:222)

at is.hail.backend.spark.SparkBackend.apply(SparkBackend.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:282)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)
  1. Yes I used pip install hail

Ok, so my first guess is that we are running into a problem with your version of Java. Can you try running

java -version

in your terminal?

This is what I get:
openjdk version “1.8.0_302”

OpenJDK Runtime Environment (build 1.8.0_302-bre_2021_08_14_22_07-b00)

OpenJDK 64-Bit Server VM (build 25.302-b00, mixed mode)

Ok, well that looks right. Next guess is that something is somehow wrong with your Spark installation. Let’s try running a small Spark script that doesn’t actually use hail.

import pyspark
from pyspark.sql import SQLContext

sc = pyspark.SparkContext()
sqlContext = SQLContext(sc) 
sample = sqlContext.createDataFrame(
    [
        ('qwe', 23),
        ('rty',34),
        ('yui',56),
        ],
    ['abc', 'def'])
sample.show()

Let me know if that works, should print a little table with the data there

Yes, I am having problems with this. I’m getting this error:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.

: java.lang.NoSuchMethodError: org.slf4j.helpers.MessageFormatter.arrayFormat(Ljava/lang/String;[Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;

at org.sparkproject.jetty.util.log.JettyAwareLogger.log(JettyAwareLogger.java:624)

at org.sparkproject.jetty.util.log.JettyAwareLogger.info(JettyAwareLogger.java:314)

at org.sparkproject.jetty.util.log.Slf4jLog.info(Slf4jLog.java:77)

at org.sparkproject.jetty.util.log.Log.initialized(Log.java:169)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:276)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:265)

at org.sparkproject.jetty.util.component.AbstractLifeCycle.<clinit>(AbstractLifeCycle.java:36)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:117)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:104)

at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:89)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1(WebUI.scala:70)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1$adapted(WebUI.scala:70)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:70)

at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:60)

at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)

at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:478)

at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:238)

at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)

at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)

Alright, so hail is not working because your spark installation isn’t working. Do you use python and pip in your terminal, or do you use python3 and pip3 (not recommending one or the other, it just depends how you installed python)?

Can you run:

echo $SPARK_HOME
which python
which pip
pip list

If you use python3 and pip3 terminal commands instead, I’d also like to see which python3 and which pip3

I use python and pip in my terminal.

  1. Nothing turns up when I run echo $SPARK_HOME
  2. When I run which python:
    /Users/rgs/miniconda3/bin/python
  3. When I run which pip:
    /Users/rgs/opt/miniconda3/bin/pip
  4. When I run pip list,
Package                  Version
------------------------ -------------------
aiohttp                  3.7.4
aiohttp-session          2.7.0
appnope                  0.1.2
async-timeout            3.0.1
asyncinit                0.2.4
attrs                    21.2.0
avro                     1.10.2
azure-core               1.18.0
azure-identity           1.6.0
azure-storage-blob       12.8.1
backcall                 0.2.0
bokeh                    1.4.0
boto3                    1.18.51
botocore                 1.21.51
brotlipy                 0.7.0
cachetools               4.2.4
certifi                  2021.5.30
cffi                     1.14.6
chardet                  3.0.4
conda                    4.10.3
conda-package-handling   1.7.3
cryptography             3.4.7
decorator                4.4.2
Deprecated               1.2.13
dill                     0.3.4
fsspec                   0.9.0
gcsfs                    0.8.0
google-api-core          1.31.3
google-auth              1.27.0
google-auth-oauthlib     0.4.6
google-cloud-core        1.7.2
google-cloud-storage     1.25.0
google-resumable-media   0.5.1
googleapis-common-protos 1.53.0
hail                     0.2.77
humanize                 1.0.0
hurry.filesize           0.9
idna                     2.10
ipython                  7.28.0
isodate                  0.6.0
janus                    0.6.1
jedi                     0.18.0
Jinja2                   3.0.1
jmespath                 0.10.0
MarkupSafe               2.0.1
matplotlib-inline        0.1.3
msal                     1.14.0
msal-extensions          0.3.0
msrest                   0.6.21
multidict                5.1.0
nest-asyncio             1.5.1
numpy                    1.21.2
oauthlib                 3.1.1
packaging                21.0
pandas                   1.1.4
parsimonious             0.8.1
parso                    0.8.2
pexpect                  4.8.0
pickleshare              0.7.5
Pillow                   8.3.2
pip                      21.1.3
portalocker              1.7.1
prompt-toolkit           3.0.20
protobuf                 3.17.3
ptyprocess               0.7.0
py4j                     0.10.9
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycosat                  0.6.3
pycparser                2.20
Pygments                 2.10.0
PyJWT                    2.1.0
pyOpenSSL                20.0.1
pyparsing                2.4.7
PySocks                  1.7.1
pyspark                  3.1.2
python-dateutil          2.8.2
python-json-logger       0.1.11
pytz                     2021.1
PyYAML                   5.4.1
requests                 2.25.1
requests-oauthlib        1.3.0
rsa                      4.7.2
ruamel-yaml-conda        0.15.100
s3transfer               0.5.0
scipy                    1.6.3
setuptools               52.0.0.post20210125
six                      1.16.0
sortedcontainers         2.1.0
tabulate                 0.8.3
tornado                  6.1
tqdm                     4.42.1
traitlets                5.1.0
typing-extensions        3.10.0.2
ujson                    4.2.0
urllib3                  1.26.6
wcwidth                  0.2.5
wheel                    0.36.2
wrapt                    1.12.1
yarl                     1.6.3

Hey @Niveditha,

This seems very likely to be a class path issue.

Can you paste the output of

env

?

Can you also try:

CLASSPATH= python3 <<EOF
import pyspark
from pyspark.sql import SQLContext

sc = pyspark.SparkContext()
sqlContext = SQLContext(sc) 
sample = sqlContext.createDataFrame(
    [
        ('qwe', 23),
        ('rty',34),
        ('yui',56),
        ],
    ['abc', 'def'])
sample.show()
EOF
  1. This is what I get when I run “env”:
TERM_PROGRAM=Apple_Terminal

SHELL=/bin/bash

TERM=xterm-256color

TMPDIR=/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/

CONDA_SHLVL=1

CONDA_PROMPT_MODIFIER=(base)

TERM_PROGRAM_VERSION=433

TERM_SESSION_ID=6D1840A2-F515-4F79-966A-DE45C22C46A2

USER=rgs

CONDA_EXE=/Users/rgs/opt/miniconda3/bin/conda

SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.EQJNK677wH/Listeners

_CE_CONDA=

PATH=/Users/rgs/miniconda3/bin:/Users/rgs/Downloads/google-cloud-sdk/bin:/opt/local/bin:/opt/local/sbin:/Users/rgs/opt/miniconda3/bin:/Users/rgs/opt/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

CONDA_PREFIX=/Users/rgs/opt/miniconda3

PWD=/Users/rgs

XPC_FLAGS=0x0

_CE_M=

XPC_SERVICE_NAME=0

SHLVL=1

HOME=/Users/rgs

CONDA_PYTHON_EXE=/Users/rgs/opt/miniconda3/bin/python

LOGNAME=rgs

LC_CTYPE=UTF-8

CONDA_DEFAULT_ENV=base

_=/usr/bin/env
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pyspark'

That … is not the error I expected.

Firstly, I’m somewhat concerned about your anaconda installation. It looks like you have two distinct conda installations on your PATH: /Users/rgs/miniconda3 as well as /Users/rgs/opt/miniconda3. You might try removing the /Users/rgs/miniconda3/bin entry from your path because it seems that Anaconda expects you to be using the /Users/rgs/opt/miniconda3/bin (note the /opt/).

After making that change, can you share the output of the following? You should be able to just copy paste the whole thing and then copy and paste the whole output back here.

which python
which python3
which pip
which pip3
env
CLASSPATH= python <<EOF
import pyspark
from pyspark.sql import SQLContext

sc = pyspark.SparkContext()
sqlContext = SQLContext(sc) 
sample = sqlContext.createDataFrame(
    [
        ('qwe', 23),
        ('rty',34),
        ('yui',56),
        ],
    ['abc', 'def'])
sample.show()
EOF

This is the output:

18:55:40.950 [main] WARN org.apache.spark.util.Utils - Your hostname, MacBook-Air-3.local resolves to a loopback address: 127.0.0.1; using 172.26.143.71 instead (on interface en0)

18:55:40.954 [main] WARN org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address

18:55:41.365 [main] DEBUG o.a.spark.util.ShutdownHookManager - Adding shutdown hook

18:55:41.398 [main] DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory

java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302) [hadoop-common-3.2.0.jar:na]

at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.12-3.1.2.jar:3.1.2]

18:55:41.416 [main] DEBUG org.apache.hadoop.util.Shell - setsid is not available on this machine. So not using it.

18:55:41.416 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0

18:55:41.541 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)])

18:55:41.545 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)])

18:55:41.545 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[GetGroups])

18:55:41.545 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since startup])

18:55:41.546 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since last successful login])

18:55:41.547 [main] DEBUG o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics

18:55:41.565 [main] DEBUG o.a.hadoop.security.SecurityUtil - Setting hadoop.security.token.service.use_ip to true

18:55:41.584 [main] DEBUG org.apache.hadoop.security.Groups - Creating new Groups object

18:55:41.588 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...

18:55:41.589 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path

18:55:41.589 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - java.library.path=/Users/rgs/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.

18:55:41.589 [main] WARN o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18:55:41.590 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based

18:55:41.592 [main] DEBUG o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping

18:55:41.705 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000

18:55:41.718 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login

18:55:41.719 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login commit

18:55:41.729 [main] DEBUG o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: rgs

18:55:41.730 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: rgs" with name rgs

18:55:41.730 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: "rgs"

18:55:41.730 [main] DEBUG o.a.h.security.UserGroupInformation - UGI loginUser:rgs (auth:SIMPLE)

18:55:41.732 [main] INFO org.apache.spark.SecurityManager - Changing view acls to: rgs

18:55:41.733 [main] INFO org.apache.spark.SecurityManager - Changing modify acls to: rgs

18:55:41.733 [main] INFO org.apache.spark.SecurityManager - Changing view acls groups to:

18:55:41.734 [main] INFO org.apache.spark.SecurityManager - Changing modify acls groups to:

18:55:41.734 [main] INFO org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rgs); groups with view permissions: Set(); users with modify permissions: Set(rgs); groups with modify permissions: Set()

18:55:41.866 [main] DEBUG o.a.s.api.python.PythonGatewayServer - Started PythonGatewayServer on port 57699

18:55:42.106 [Thread-3] INFO org.apache.spark.SparkContext - Running Spark version 3.1.2

18:55:42.196 [Thread-3] INFO o.a.spark.resource.ResourceUtils - ==============================================================

18:55:42.196 [Thread-3] INFO o.a.spark.resource.ResourceUtils - No custom resources configured for spark.driver.

18:55:42.197 [Thread-3] INFO o.a.spark.resource.ResourceUtils - ==============================================================

18:55:42.197 [Thread-3] INFO org.apache.spark.SparkContext - Submitted application: pyspark-shell

18:55:42.246 [Thread-3] INFO o.a.spark.resource.ResourceProfile - Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)

18:55:42.267 [Thread-3] INFO o.a.spark.resource.ResourceProfile - Limiting resource is cpu

18:55:42.268 [Thread-3] INFO o.a.s.r.ResourceProfileManager - Added ResourceProfile id: 0

18:55:42.383 [Thread-3] INFO org.apache.spark.SecurityManager - Changing view acls to: rgs

18:55:42.384 [Thread-3] INFO org.apache.spark.SecurityManager - Changing modify acls to: rgs

18:55:42.384 [Thread-3] INFO org.apache.spark.SecurityManager - Changing view acls groups to:

18:55:42.384 [Thread-3] INFO org.apache.spark.SecurityManager - Changing modify acls groups to:

18:55:42.384 [Thread-3] INFO org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rgs); groups with view permissions: Set(); users with modify permissions: Set(rgs); groups with modify permissions: Set()

log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

18:55:42.870 [Thread-3] DEBUG o.a.s.network.server.TransportServer - Shuffle server started on port: 57701

18:55:42.886 [Thread-3] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 57701.

18:55:42.888 [Thread-3] DEBUG org.apache.spark.SparkEnv - Using serializer: class org.apache.spark.serializer.JavaSerializer

18:55:42.946 [Thread-3] INFO org.apache.spark.SparkEnv - Registering MapOutputTracker

18:55:42.947 [Thread-3] DEBUG o.a.s.MapOutputTrackerMasterEndpoint - init

18:55:43.017 [Thread-3] INFO org.apache.spark.SparkEnv - Registering BlockManagerMaster

18:55:43.063 [Thread-3] INFO o.a.s.s.BlockManagerMasterEndpoint - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information

18:55:43.064 [Thread-3] INFO o.a.s.s.BlockManagerMasterEndpoint - BlockManagerMasterEndpoint up

18:55:43.071 [Thread-3] INFO org.apache.spark.SparkEnv - Registering BlockManagerMasterHeartbeat

18:55:43.101 [Thread-3] INFO o.a.spark.storage.DiskBlockManager - Created local directory at /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/blockmgr-d8af6eb8-2713-4625-abf7-cc88e7c46ffb

18:55:43.102 [Thread-3] DEBUG o.a.spark.storage.DiskBlockManager - Adding shutdown hook

18:55:43.143 [Thread-3] INFO o.a.spark.storage.memory.MemoryStore - MemoryStore started with capacity 366.3 MiB

18:55:43.174 [Thread-3] INFO org.apache.spark.SparkEnv - Registering OutputCommitCoordinator

18:55:43.175 [Thread-3] DEBUG o.a.s.s.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - init

18:55:43.209 [Thread-3] DEBUG org.apache.spark.SecurityManager - Created SSL options for ui: SSLOptions{enabled=false, port=None, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}

Traceback (most recent call last):

File "<stdin>", line 4, in <module>

File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/pyspark/context.py", line 146, in __init__

self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/pyspark/context.py", line 209, in _do_init

self._jsc = jsc or self._initialize_context(self._conf._jconf)

File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/pyspark/context.py", line 321, in _initialize_context

return self._jvm.JavaSparkContext(jconf)

File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/py4j/java_gateway.py", line 1568, in __call__

return_value = get_return_value(

File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value

raise Py4JJavaError(

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.

: java.lang.NoSuchMethodError: org.slf4j.helpers.MessageFormatter.arrayFormat(Ljava/lang/String;[Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;

at org.sparkproject.jetty.util.log.JettyAwareLogger.log(JettyAwareLogger.java:624)

at org.sparkproject.jetty.util.log.JettyAwareLogger.info(JettyAwareLogger.java:314)

at org.sparkproject.jetty.util.log.Slf4jLog.info(Slf4jLog.java:77)

at org.sparkproject.jetty.util.log.Log.initialized(Log.java:169)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:276)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:265)

at org.sparkproject.jetty.util.component.AbstractLifeCycle.<clinit>(AbstractLifeCycle.java:36)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:117)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:104)

at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:89)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1(WebUI.scala:70)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1$adapted(WebUI.scala:70)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:70)

at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:60)

at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)

at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:478)

at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:238)

at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)

at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)

18:55:43.366 [main] DEBUG o.a.s.api.python.PythonGatewayServer - Exiting due to broken pipe from Python driver

18:55:43.371 [shutdown-hook-0] INFO o.a.spark.storage.DiskBlockManager - Shutdown hook called

(base) MacBook-Air-3:~ rgs$ 18:55:43.381 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Shutdown hook called

18:55:43.382 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/spark-cb41c335-5bf8-4acd-86df-c55b1f0cc002

18:55:43.387 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/spark-3a5cb3dd-92ce-44d2-bb40-5cd370c7d1e8

18:55:43.392 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/spark-cb41c335-5bf8-4acd-86df-c55b1f0cc002/userFiles-ddeefde2-77a8-40c0-9a91-7beafee89728

18:55:43.398 [Thread-1] DEBUG o.a.hadoop.util.ShutdownHookManager - Completed shutdown in 0.028 seconds; Timeouts: 0

18:55:43.403 [Thread-1] DEBUG o.a.hadoop.util.ShutdownHookManager - ShutdownHookManger completed shutdown.

This suggests to me that your Spark installation is severely broken. Let’s try starting from scratch:

conda create -n conda-for-hail python=3.7
conda activate conda-for-hail

At this point your command line prompt should look something like this:

(conda-for-hail) #

Now try installing Hail and ipython:

python3 -m pip install hail ipython

Now verify that hail was installed correctly:

python3 <<EOF
import hail as hl
mt = hl.balding_nichols_model(n_populations=3,
n_samples=10,
n_variants=100)
mt.show()
EOF

I unfortunately still get an error. Here is the complete output:

Initializing Hail with default parameters...

19:30:22.650 [main] WARN org.apache.spark.util.Utils - Your hostname, MacBook-Air-3.local resolves to a loopback address: 127.0.0.1; using 172.26.143.71 instead (on interface en0)

19:30:22.656 [main] WARN org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address

19:30:23.074 [main] DEBUG o.a.spark.util.ShutdownHookManager - Adding shutdown hook

19:30:23.111 [main] DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory

java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207) [hadoop-common-3.2.0.jar:na]

at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302) [hadoop-common-3.2.0.jar:na]

at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) [spark-core_2.12-3.1.2.jar:3.1.2]

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.12-3.1.2.jar:3.1.2]

19:30:23.135 [main] DEBUG org.apache.hadoop.util.Shell - setsid is not available on this machine. So not using it.

19:30:23.135 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0

19:30:23.320 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)])

19:30:23.323 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)])

19:30:23.323 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[GetGroups])

19:30:23.324 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since startup])

19:30:23.325 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since last successful login])

19:30:23.327 [main] DEBUG o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics

19:30:23.348 [main] DEBUG o.a.hadoop.security.SecurityUtil - Setting hadoop.security.token.service.use_ip to true

19:30:23.373 [main] DEBUG org.apache.hadoop.security.Groups - Creating new Groups object

19:30:23.376 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...

19:30:23.377 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path

19:30:23.377 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - java.library.path=/Users/rgs/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.

19:30:23.377 [main] WARN o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

19:30:23.378 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based

19:30:23.380 [main] DEBUG o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping

19:30:23.486 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000

19:30:23.506 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login

19:30:23.507 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login commit

19:30:23.516 [main] DEBUG o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: rgs

19:30:23.517 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: rgs" with name rgs

19:30:23.517 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: "rgs"

19:30:23.517 [main] DEBUG o.a.h.security.UserGroupInformation - UGI loginUser:rgs (auth:SIMPLE)

19:30:23.611 [main] DEBUG org.apache.hadoop.fs.FileSystem - Loading filesystems

19:30:23.626 [main] DEBUG org.apache.hadoop.fs.FileSystem - hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-hdfs-client-3.2.0.jar

19:30:23.646 [main] DEBUG org.apache.hadoop.fs.FileSystem - webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-hdfs-client-3.2.0.jar

19:30:23.646 [main] DEBUG org.apache.hadoop.fs.FileSystem - swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-hdfs-client-3.2.0.jar

19:30:23.651 [main] DEBUG org.apache.hadoop.fs.FileSystem - nullscan:// = class org.apache.hadoop.hive.ql.io.NullScanFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hive-exec-2.3.7-core.jar

19:30:23.668 [main] DEBUG org.apache.hadoop.fs.FileSystem - file:// = class org.apache.hadoop.fs.LocalFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-common-3.2.0.jar

19:30:23.669 [main] DEBUG org.apache.hadoop.fs.FileSystem - file:// = class org.apache.hadoop.hive.ql.io.ProxyLocalFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hive-exec-2.3.7-core.jar

19:30:23.677 [main] DEBUG org.apache.hadoop.fs.FileSystem - viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-common-3.2.0.jar

19:30:23.680 [main] DEBUG org.apache.hadoop.fs.FileSystem - har:// = class org.apache.hadoop.fs.HarFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-common-3.2.0.jar

19:30:23.683 [main] DEBUG org.apache.hadoop.fs.FileSystem - http:// = class org.apache.hadoop.fs.http.HttpFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-common-3.2.0.jar

19:30:23.684 [main] DEBUG org.apache.hadoop.fs.FileSystem - https:// = class org.apache.hadoop.fs.http.HttpsFileSystem from /Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/pyspark/jars/hadoop-common-3.2.0.jar

19:30:23.685 [main] DEBUG org.apache.hadoop.fs.FileSystem - Looking for FS supporting file

19:30:23.685 [main] DEBUG org.apache.hadoop.fs.FileSystem - looking for configuration option fs.file.impl

19:30:23.696 [main] DEBUG org.apache.hadoop.fs.FileSystem - Looking in service filesystems for implementation class

19:30:23.696 [main] DEBUG org.apache.hadoop.fs.FileSystem - FS for file is class org.apache.hadoop.hive.ql.io.ProxyLocalFileSystem

19:30:23.808 [main] INFO org.apache.spark.SecurityManager - Changing view acls to: rgs

19:30:23.808 [main] INFO org.apache.spark.SecurityManager - Changing modify acls to: rgs

19:30:23.809 [main] INFO org.apache.spark.SecurityManager - Changing view acls groups to:

19:30:23.810 [main] INFO org.apache.spark.SecurityManager - Changing modify acls groups to:

19:30:23.811 [main] INFO org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rgs); groups with view permissions: Set(); users with modify permissions: Set(rgs); groups with modify permissions: Set()

19:30:23.949 [main] DEBUG o.a.s.api.python.PythonGatewayServer - Started PythonGatewayServer on port 59202

2021-09-30 19:30:24 WARN Hail:43 - This Hail JAR was compiled for Spark 3.1.1, running with Spark 3.1.2.

Compatibility is not guaranteed.

19:30:24.243 [Thread-4] INFO org.apache.spark.SparkContext - Running Spark version 3.1.2

19:30:24.300 [Thread-4] INFO o.a.spark.resource.ResourceUtils - ==============================================================

19:30:24.300 [Thread-4] INFO o.a.spark.resource.ResourceUtils - No custom resources configured for spark.driver.

19:30:24.301 [Thread-4] INFO o.a.spark.resource.ResourceUtils - ==============================================================

19:30:24.301 [Thread-4] INFO org.apache.spark.SparkContext - Submitted application: Hail

19:30:24.308 [Thread-4] INFO org.apache.spark.SparkContext - Spark configuration:

spark.app.name=Hail

spark.app.startTime=1633026624243

spark.driver.extraClassPath=/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar

spark.driver.maxResultSize=0

spark.executor.extraClassPath=./hail-all-spark.jar

spark.hadoop.io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,is.hail.io.compress.BGzipCodec,is.hail.io.compress.BGzipCodecTbi,org.apache.hadoop.io.compress.GzipCodec

spark.hadoop.mapreduce.input.fileinputformat.split.minsize=0

spark.jars=file:///Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar

spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator

spark.kryoserializer.buffer.max=1g

spark.logConf=true

spark.master=local[*]

spark.repl.local.jars=file:///Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.submit.deployMode=client

spark.submit.pyFiles=

spark.ui.showConsoleProgress=false

19:30:24.345 [Thread-4] INFO o.a.spark.resource.ResourceProfile - Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)

19:30:24.363 [Thread-4] INFO o.a.spark.resource.ResourceProfile - Limiting resource is cpu

19:30:24.363 [Thread-4] INFO o.a.s.r.ResourceProfileManager - Added ResourceProfile id: 0

19:30:24.448 [Thread-4] INFO org.apache.spark.SecurityManager - Changing view acls to: rgs

19:30:24.449 [Thread-4] INFO org.apache.spark.SecurityManager - Changing modify acls to: rgs

19:30:24.449 [Thread-4] INFO org.apache.spark.SecurityManager - Changing view acls groups to:

19:30:24.449 [Thread-4] INFO org.apache.spark.SecurityManager - Changing modify acls groups to:

19:30:24.449 [Thread-4] INFO org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rgs); groups with view permissions: Set(); users with modify permissions: Set(rgs); groups with modify permissions: Set()

19:30:24.837 [Thread-4] DEBUG o.a.s.network.server.TransportServer - Shuffle server started on port: 59204

19:30:24.847 [Thread-4] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 59204.

19:30:24.863 [Thread-4] DEBUG org.apache.spark.SparkEnv - Using serializer: class org.apache.spark.serializer.KryoSerializer

19:30:24.894 [Thread-4] INFO org.apache.spark.SparkEnv - Registering MapOutputTracker

19:30:24.894 [Thread-4] DEBUG o.a.s.MapOutputTrackerMasterEndpoint - init

19:30:24.969 [Thread-4] INFO org.apache.spark.SparkEnv - Registering BlockManagerMaster

19:30:25.017 [Thread-4] INFO o.a.s.s.BlockManagerMasterEndpoint - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information

19:30:25.018 [Thread-4] INFO o.a.s.s.BlockManagerMasterEndpoint - BlockManagerMasterEndpoint up

19:30:25.023 [Thread-4] INFO org.apache.spark.SparkEnv - Registering BlockManagerMasterHeartbeat

19:30:25.060 [Thread-4] INFO o.a.spark.storage.DiskBlockManager - Created local directory at /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/blockmgr-61101b16-4322-4f44-9e36-fc0d6f379d05

19:30:25.062 [Thread-4] DEBUG o.a.spark.storage.DiskBlockManager - Adding shutdown hook

19:30:25.104 [Thread-4] INFO o.a.spark.storage.memory.MemoryStore - MemoryStore started with capacity 366.3 MiB

19:30:25.137 [Thread-4] INFO org.apache.spark.SparkEnv - Registering OutputCommitCoordinator

19:30:25.138 [Thread-4] DEBUG o.a.s.s.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - init

19:30:25.168 [Thread-4] DEBUG org.apache.spark.SecurityManager - Created SSL options for ui: SSLOptions{enabled=false, port=None, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}

Traceback (most recent call last):

File "<stdin>", line 4, in <module>

File "<decorator-gen-1741>", line 2, in balding_nichols_model

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 576, in wrapper

args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 543, in check_all

args_.append(arg_check(args[i], name, arg_name, checker))

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 584, in arg_check

return checker.check(arg, function_name, arg_name)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 82, in check

return tc.check(x, caller, param)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 328, in check

return f(tc.check(x, caller, param))

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/genetics/reference_genome.py", line 10, in <lambda>

reference_genome_type = oneof(transformed((str, lambda x: hl.get_reference(x))), rg_type)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/context.py", line 554, in get_reference

Env.hc()

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/utils/java.py", line 55, in hc

init()

File "<decorator-gen-1821>", line 2, in init

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 577, in wrapper

return __original_func(*args_, **kwargs_)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/context.py", line 252, in init

skip_logging_configuration, optimizer_iterations)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/hail/backend/spark_backend.py", line 174, in __init__

jsc, app_name, master, local, True, min_block_size, tmpdir, local_tmpdir)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1305, in __call__

answer, self.gateway_client, self.target_id, self.name)

File "/Users/rgs/opt/miniconda3/envs/conda-for-hail/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value

format(target_id, ".", name), value)

py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.

: java.lang.NoSuchMethodError: org.slf4j.helpers.MessageFormatter.arrayFormat(Ljava/lang/String;[Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;

at org.sparkproject.jetty.util.log.JettyAwareLogger.log(JettyAwareLogger.java:624)

at org.sparkproject.jetty.util.log.JettyAwareLogger.info(JettyAwareLogger.java:314)

at org.sparkproject.jetty.util.log.Slf4jLog.info(Slf4jLog.java:77)

at org.sparkproject.jetty.util.log.Log.initialized(Log.java:169)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:276)

at org.sparkproject.jetty.util.log.Log.getLogger(Log.java:265)

at org.sparkproject.jetty.util.component.AbstractLifeCycle.<clinit>(AbstractLifeCycle.java:36)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:117)

at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:104)

at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:89)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1(WebUI.scala:70)

at org.apache.spark.ui.WebUI.$anonfun$attachTab$1$adapted(WebUI.scala:70)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:70)

at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:60)

at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)

at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:478)

at is.hail.backend.spark.SparkBackend$.configureAndCreateSparkContext(SparkBackend.scala:146)

at is.hail.backend.spark.SparkBackend$.apply(SparkBackend.scala:222)

at is.hail.backend.spark.SparkBackend.apply(SparkBackend.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:282)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)

19:30:25.560 [main] DEBUG o.a.s.api.python.PythonGatewayServer - Exiting due to broken pipe from Python driver

(conda-for-hail) MacBook-Air-3:~ rgs$ 19:30:25.564 [shutdown-hook-0] INFO o.a.spark.storage.DiskBlockManager - Shutdown hook called

19:30:25.580 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Shutdown hook called

19:30:25.580 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/spark-1882f0f2-aad1-40e4-a272-ba3e80661af9

19:30:25.585 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/spark-53cac155-6e4c-4c78-93f7-b3bb65dc427b/userFiles-389708a4-5749-4a3e-bd3a-c0cc3cecccb7

19:30:25.590 [shutdown-hook-0] INFO o.a.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/sm/795yv_kj01z2spgzwlq0syth0000gn/T/spark-53cac155-6e4c-4c78-93f7-b3bb65dc427b

19:30:25.597 [Thread-1] DEBUG o.a.hadoop.util.ShutdownHookManager - Completed shutdown in 0.033 seconds; Timeouts: 0

19:30:25.603 [Thread-1] DEBUG o.a.hadoop.util.ShutdownHookManager - ShutdownHookManger completed shutdown.

@Niveditha ,

Sorry for all the failed attempts. I have one last idea. This might be due to a misconfiguration in Hail’s build system. I do not know why your system is uncovering this misconfiguration. Can you try downloading this wheel:

https://storage.googleapis.com/storage/v1/b/hail-common/o/danking%2Fhail-0.2.77-py3-none-any.whl?alt=media

And saving it with the name hail-0.2.77-py3-none-any.whl

And installing it with:

pip3 install hail-0.2.77-py3-none-any.whl

If you have curl installed, you can just run this:

curl 'https://storage.googleapis.com/storage/v1/b/hail-common/o/danking%2Fhail-0.2.77-py3-none-any.whl?alt=media' > hail-0.2.77-py3-none-any.whl
pip3 install hail-0.2.77-py3-none-any.whl

Finally, can you try this again?

python3 <<EOF
import hail as hl
mt = hl.balding_nichols_model(n_populations=3,
n_samples=10,
n_variants=100)
mt.show()
EOF

@danking Thank you for all the suggestions thus far.
I had to force reinstall hail which involved a slight modification of your suggestion above. In the end, I had this new error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'hail'

Huh. That means Hail didn’t install correctly. Are you sure the pip binary you used and the python binary you used came from the same Python installation? You can verify by looking at the paths printed by the following (assuming that you used pip3 and python3):

which pip3
which python3

And you’re certain there were no errors once you finally got Hail installed? Does pip3 think it is installed? Make sure both these commands print the same thing:

pip3 show hail
python3 -m show hail
  1. Sorry, I made an error in running it the previous time. But when I run
python3 <<EOF
import hail as hl
mt = hl.balding_nichols_model(n_populations=3,
n_samples=10,
n_variants=100)
mt.show()
EOF

now, it shows an error message as usual.

  1. which pip3 and which python3 show the same output
  2. You’re right, both those commands do not print the same thing.
    python3 -m show hail results in an error:
Traceback (most recent call last):
  File "/Users/rgs/opt/miniconda3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/rgs/opt/miniconda3/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/rgs/opt/miniconda3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/show/__init__.py", line 2, in <module>
    from show.core import show, Show, noshow, NoShow, fmt, say
  File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/show/core.py", line 16, in <module>
    from .introspect import *
  File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/show/introspect.py", line 7, in <module>
    from .astor import to_source as astor_to_source
  File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/show/astor/__init__.py", line 12, in <module>
    from .code_gen import to_source  # NOQA
  File "/Users/rgs/opt/miniconda3/lib/python3.9/site-packages/show/astor/code_gen.py", line 264
    def visit_FunctionDef(self, node, async=False):
                                      ^
SyntaxError: invalid syntax

Ah, yeah, that’s my mistake it should have been python3 -m pip show hail.

I’m sorry , @Niveditha , but I’m all out of ideas. There’s something fishy with Spark or Java or SLF4J on your laptop and I just have no idea what the issue is.