Import_vcf, old to new hail switch: py4j.protocol.Py4JNetworkError: Answer from Java side is empty

I know there is another thread but it didn’t help me solving the issue:

For Hail 0.2.57 everything works for me perfectly, but when I tried using the latest version - 0.2.70 it gave the error. Here is the full stack trace:

Initializing Hail with default parameters...
Exception in thread "Thread-6" java.lang.NoClassDefFoundError: scala/Product$class
	at is.hail.relocated.org.json4s.NoTypeHints$.<init>(Formats.scala:429)
	at is.hail.relocated.org.json4s.NoTypeHints$.<clinit>(Formats.scala)
	at is.hail.utils.package$.<init>(package.scala:472)
	at is.hail.utils.package$.<clinit>(package.scala)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
	at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
	at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
	at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
	at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)INFO:py4j.java_gateway:Error while receiving.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 13 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1033, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1212, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR: [pid 28790] Worker Worker(salt=465004285, workers=1, host=ip-172-21-61-13, username=hadoop, pid=28790) failed    SeqrVCFToMTTask(source_paths=["s3://seqr-dp-data--prod/vcf/batch109_subset.vcf"], dest_path=s3://seqr-dp-build--qa/mt-hail-luigi/test/batch109_subset.mt, genome_version=38, array_elements_required=False, vep_runner=VEP, reference_ht_path=s3://combined_reference_data_grch38.ht, clinvar_ht_path=s3://clinvar.GRCh38.ht, hgmd_like_csv_path=s3://GRCh38_HGMD_2020_03_v2.csv, hgmd_ht_path=s3://hgmd_hg38.ht, cidr_ht_path=None, nisc_ht_path=s3://NISC.ht, bgi_ht_path=s3://BGI.ht, hgsc_wes_ht_path=None, hgsc_wgs_ht_path=s3://HGSC_WGS.ht, sample_type=WES, validate=False, dataset_type=VARIANTS, remap_path=, subset_path=)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/luigi/worker.py", line 199, in run
    new_deps = self._run_get_new_deps()
  File "/usr/local/lib/python3.7/site-packages/luigi/worker.py", line 141, in _run_get_new_deps
    task_gen = self.task.run()
  File "/home/hadoop/hail-elasticsearch-pipelines/luigi_pipeline/seqr_loading.py", line 51, in run
    self.read_vcf_write_mt()
  File "/home/hadoop/hail-elasticsearch-pipelines/luigi_pipeline/seqr_loading.py", line 54, in read_vcf_write_mt
    mt = self.import_vcf()
  File "/home/hadoop/hail-elasticsearch-pipelines/luigi_pipeline/lib/hail_tasks.py", line 105, in import_vcf
    force_bgz=True, min_partitions=500, array_elements_required=self.array_elements_required)
  File "<decorator-gen-1316>", line 2, in import_vcf
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/check.py", line 576, in wrapper
    args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/check.py", line 543, in check_all
    args_.append(arg_check(args[i], name, arg_name, checker))
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/check.py", line 584, in arg_check
    return checker.check(arg, function_name, arg_name)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/check.py", line 82, in check
    return tc.check(x, caller, param)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/check.py", line 328, in check
    return f(tc.check(x, caller, param))
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/genetics/reference_genome.py", line 10, in <lambda>
    reference_genome_type = oneof(transformed((str, lambda x: hl.get_reference(x))), rg_type)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/context.py", line 554, in get_reference
    Env.hc()
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/utils/java.py", line 55, in hc
    init()
  File "<decorator-gen-1658>", line 2, in init
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/context.py", line 252, in init
    skip_logging_configuration, optimizer_iterations)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/backend/spark_backend.py", line 163, in __init__
    self._utils_package_object = scala_package_object(hail_package.utils)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/utils/java.py", line 122, in scala_package_object
    return scala_object(jpackage, 'package')
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/utils/java.py", line 118, in scala_object
    return getattr(getattr(jpackage, name + '$'), 'MODULE$')
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1644, in __getattr__
    raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
py4j.protocol.Py4JError: is.hail.utils.package$ does not exist in the JVM

It fails when import_vcf is run:

hl.import_vcf([vcf_file for vcf_file in self.source_paths],
                             reference_genome='GRCh' + self.genome_version,
                             force_bgz=True, min_partitions=500, 
                             array_elements_required=self.array_elements_required)

What version of Spark are you using? I think between 0.2.57 and 0.2.70 we updated from Spark2 to Spark3 in PyPI artifacts.

1 Like

Ok, we realized that we were using Spark 2, so we updated it and the issue was resolved but we faced a different one:

  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 32, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None
hail.utils.java.FatalError: UnsupportedFileSystemException: No FileSystem for scheme "s3"
Java stack trace:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

When we ran to check if s3 is supported by Hail it gave us:


[hadoop@ip-4234 ~]$ python3
Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import hail
>>> 
>>> hail.utils.hadoop_scheme_supported('s3')
Initializing Hail with default parameters...
2021-06-24 20:10:39 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.2
SparkUI available at http://ip-34453.ec2.internal:2323
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.70-5bb98953a4a7
LOGGING: writing to /home/hadoop/hail-20210624-2010-0.2.70-5bb98953a4a7.log
False

What’s your runtime? Are you running locally and reading from S3?

Previously it was working just fine with 0.2.57, so I am not sure its related to some kind of an access issue. Its AWS EMR - emr-6.3.0 which accesses s3. But we can log in to cluster and work locally with it also.

How did you update Spark? Did you use a different EMR image/version? It looks like you have pyspark installed with pip:

  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__

You might try sshing into the driver node, running pip3 uninstall pyspark -y and see if that fixes things by using the EMR spark installation

1 Like