TypeError: 'JavaPackage' object is not callable when using Pyspark

Whenever I try to use PySpark within Python code:

from pyspark import SparkConf, SparkContext

conf = (SparkConf().set("spark.executor.memory", "5g").set("spark driver.memory", "5g"))
sc = SparkContext(conf= conf)
import hail as hl
def file2hailMTX(bed, bim, fam, vcf, paramSM, outputMTX):
    hl.init(min_block_size=128)
    if paramSM == 'vcf':
        hl.import_vcf(vcf, force_bgz=True).write(outputMTX, overwrite=True)
    else:
        hl.import_plink(bed=bed, bim=bim, fam=fam).write(outputMTX, overwrite=True)

file2hailMTX(snakemake.input[0], snakemake.input[1], snakemake.input[2], snakemake.input[3], snakemake.params[0], snakemake.output[0])

I am facing the error:

Warning: Ignoring non-Spark config property: spark driver.memory
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/03/14 12:28:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "/Users/ani/PycharmProjects/HailStuff/.snakemake/scripts/tmp9oofbn4p.NewHailScript.py", line 17, in <module>
    file2hailMTX(snakemake.input[0], snakemake.input[1], snakemake.input[2], snakemake.input[3], snakemake.params[0], snakemake.output[0])
  File "/Users/ani/PycharmProjects/HailStuff/.snakemake/scripts/tmp9oofbn4p.NewHailScript.py", line 11, in file2hailMTX
    hl.init(min_block_size=128)
  File "<decorator-gen-1700>", line 2, in init
  File "/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/hail/context.py", line 290, in init
    backend = SparkBackend(
  File "/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/hail/backend/spark_backend.py", line 181, in __init__
    self._jbackend = hail_package.backend.spark.SparkBackend.apply(
TypeError: 'JavaPackage' object is not callable
[Mon Mar 14 12:28:32 2022]
Error in rule vcf_or_plink_hail:
    jobid: 4
    output: MTX/mtx_from_chr4.mt

RuleException:
CalledProcessError in line 12 of /Users/ani/PycharmProjects/HailStuff/rules/vcf_or_plink.smk:
Command 'set -euo pipefail;  /Users/ani/opt/anaconda3/envs/snakemake/bin/python3.10 /Users/ani/PycharmProjects/HailStuff/.snakemake/scripts/tmp9oofbn4p.NewHailScript.py' returned non-zero exit status 1.
  File "/Users/ani/PycharmProjects/HailStuff/rules/vcf_or_plink.smk", line 12, in __rule_vcf_or_plink_hail
  File "/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/ani/PycharmProjects/HailStuff/.snakemake/log/2022-03-14T122823.285810.snakemake.log

As far as I get it, the main issue’s:TypeError: 'JavaPackage' object is not callable I have looked through the other errors, but I guess my issue’s a bit different. Possibly I have done smth wrong in my Python code, but I have no idea what is the error actually.

Hey @annalisasnow !

This means Hail is not installed correctly. What kind of computer are you using? How did you install Hail? Can you share the output of:

java -version
echo $JAVA_HOME
which java
python --version
python3 --version
pip3 show hail
python -m pip show hail
python3 -m pip show hail

There are actually two computers, showing the same error, one is Mac, another one is Linux, running Cent OS. As Mac’s my local machine, where I often test stuff, let’s start from it. I also think if we find the error I will be able to reproduce the process on Linux machine.

java -version

openjdk version "11.0.14.1" 2022-02-08

OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)

OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)

Next:
echo $JAVA_HOME

Is empty.
Next:

which java
/usr/bin/java
python --version
Python 3.9.7
python3 --version
Python 3.9.7
pip3 show hail

Name: hail

Version: 0.2.85

Summary: Scalable library for exploring and analyzing genomic data.

Home-page: https://hail.is

Author: Hail Team

Author-email: hail@broadinstitute.org

License: UNKNOWN

Location: /Users/ani/opt/anaconda3/lib/python3.9/site-packages

Requires: avro, requests, nest-asyncio, botocore, decorator, aiohttp, azure-identity, google-auth, pandas, dill, hurry.filesize, parsimonious, asyncinit, google-cloud-storage, Deprecated, tqdm, PyJWT, aiohttp-session, scipy, python-json-logger, pyspark, uvloop, janus, boto3, bokeh, numpy, sortedcontainers, gcsfs, plotly, tabulate, humanize, orjson, azure-storage-blob

Required-by:
python -m pip show hail
Name: hail
Version: 0.2.85
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: /Users/ani/opt/anaconda3/lib/python3.9/site-packages
Requires: azure-identity, PyJWT, gcsfs, hurry.filesize, humanize, avro, decorator, aiohttp-session, tabulate, aiohttp, azure-storage-blob, pyspark, plotly, scipy, requests, google-auth, python-json-logger, sortedcontainers, uvloop, janus, dill, nest-asyncio, orjson, pandas, google-cloud-storage, asyncinit, parsimonious, numpy, botocore, boto3, Deprecated, bokeh, tqdm
Required-by: 

And finally:

python3 -m pip show hail
Name: hail
Version: 0.2.85
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: /Users/ani/opt/anaconda3/lib/python3.9/site-packages
Requires: botocore, nest-asyncio, sortedcontainers, azure-identity, aiohttp-session, tqdm, decorator, google-auth, orjson, boto3, aiohttp, pyspark, asyncinit, dill, gcsfs, Deprecated, parsimonious, humanize, janus, bokeh, azure-storage-blob, tabulate, plotly, requests, scipy, hurry.filesize, PyJWT, uvloop, pandas, google-cloud-storage, python-json-logger, avro, numpy
Required-by: 

Just in case, I’ve repeated it on Cent OS:

java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

And Java HOME is empty again:

echo $JAVA_HOME

Then

which java
/usr/bin/java
python --version
Python 3.9.7
python3 --version
Python 3.9.7
pip3 show hail
Name: hail
Version: 0.2.89
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: /home/karchevskaya/anaconda3/lib/python3.9/site-packages
Requires: parsimonious, azure-identity, aiohttp, pandas, boto3, google-cloud-storage, orjson, scipy, botocore, numpy, tabulate, janus, hurry.filesize, azure-storage-blob, aiohttp-session, tqdm, python-jso
n-logger, bokeh, dill, sortedcontainers, uvloop, requests, gcsfs, plotly, decorator, humanize, PyJWT, nest-asyncio, asyncinit, avro, Deprecated, pyspark, google-auth
Required-by: 

python -m pip show hail
Name: hail
Version: 0.2.89
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: /home/karchevskaya/anaconda3/lib/python3.9/site-packages
Requires: gcsfs, scipy, tabulate, aiohttp, botocore, google-cloud-storage, asyncinit, hurry.filesize, python-json-logger, requests, nest-asyncio, sortedcontainers, azure-identity, pandas, pyspark, tqdm, b
okeh, parsimonious, numpy, orjson, azure-storage-blob, avro, google-auth, PyJWT, boto3, dill, uvloop, plotly, aiohttp-session, Deprecated, humanize, janus, decorator
Required-by: 

The last one:

python3 -m pip show hail
Name: hail
Version: 0.2.89
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: /home/karchevskaya/anaconda3/lib/python3.9/site-packages
Requires: requests, asyncinit, orjson, pyspark, aiohttp-session, google-cloud-storage, tabulate, boto3, hurry.filesize, azure-storage-blob, google-auth, nest-asyncio, decorator, avro, scipy, bokeh, tqdm, 
aiohttp, sortedcontainers, janus, numpy, parsimonious, gcsfs, azure-identity, uvloop, Deprecated, pandas, botocore, plotly, dill, humanize, PyJWT, python-json-logger
Required-by: 

Hmm. So the first thing that stands out to me is that the Mac computer has two installations of Hail. In your first post, the error message includes this path:

/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/hail/backend/spark_backend.py

But pip thinks Hail is located here:

/Users/ani/opt/anaconda3/lib/python3.9/site-packages

Let’s try to replicate this error directly. What happens if you run this?

conda activate snakemake
python3 -c 'import hail as hl; hl.utils.range_table(10)._force_count()'

I think there’s snakmake’s pip and libs as well as the pip which is default (maybe I am wrong). As per output, here it is:

conda activate snakemake
python3 -c 'import hail as hl; hl.utils.range_table(10)._force_count()'
Initializing Hail with default parameters...
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2022-03-16 10:18:20 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2022-03-16 10:18:21 WARN  Hail:43 - This Hail JAR was compiled for Spark 3.1.2, running with Spark 3.1.3.
  Compatibility is not guaranteed.
Running on Apache Spark version 3.1.3
SparkUI available at http://blabla
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.90-ef50e56cae11
LOGGING: writing to /Users/ani/hail-20220316-1018-0.2.90-ef50e56cae11.log

Hmm. This suggests that Hail is installed correctly, at least in that environment.

Can you try running the script you shared above after running conda activate snakemake?

Can you also figure out what environment variables are set inside the snakemake job? If you can run the env command in a snakemake job? It’s possible SPARK_HOME is set incorrectly or maybe java is not in the right location.

I can’t run it without activating Snakemake, so the err in the first message comes when the Snakemake’s activated.

It seems SPARK_HOME is not set. At least that’s what env shows.

Can you try setting SPARK_HOME to point at your spark installation? If you’re Spark in local mode (aka not on a cluster like Google Dataproc), then SPARK_HOME should look something like:

/Users/ani/opt/anaconda3/envs/snakemake/lib/python3.10/site-packages/pyspark

EDIT: Can you also verify that pyspark is installed in the snakemake environment?