Import_vcf, old to new hail switch: py4j.protocol.Py4JNetworkError: Answer from Java side is empty

I know there is another thread but it didn’t help me solving the issue:

For Hail 0.2.57 everything works for me perfectly, but when I tried using the latest version - 0.2.70 it gave the error. Here is the full stack trace:

Initializing Hail with default parameters...
Exception in thread "Thread-6" java.lang.NoClassDefFoundError: scala/Product$class
	at is.hail.utils.package$.<init>(package.scala:472)
	at is.hail.utils.package$.<clinit>(package.scala)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(
	at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(
	at py4j.reflection.ReflectionUtil.classForName(
	at py4j.reflection.TypeUtil.forName(
	at py4j.commands.ReflectionCommand.getUnknownMember(
	at py4j.commands.ReflectionCommand.execute(
Caused by: java.lang.ClassNotFoundException: scala.Product$class
	at java.lang.ClassLoader.loadClass(
	at sun.misc.Launcher$AppClassLoader.loadClass( while receiving.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

	at java.lang.ClassLoader.loadClass(
	... 13 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1033, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1212, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR: [pid 28790] Worker Worker(salt=465004285, workers=1, host=ip-172-21-61-13, username=hadoop, pid=28790) failed    SeqrVCFToMTTask(source_paths=["s3://seqr-dp-data--prod/vcf/batch109_subset.vcf"], dest_path=s3://seqr-dp-build--qa/mt-hail-luigi/test/, genome_version=38, array_elements_required=False, vep_runner=VEP, reference_ht_path=s3://, clinvar_ht_path=s3://, hgmd_like_csv_path=s3://GRCh38_HGMD_2020_03_v2.csv, hgmd_ht_path=s3://, cidr_ht_path=None, nisc_ht_path=s3://, bgi_ht_path=s3://, hgsc_wes_ht_path=None, hgsc_wgs_ht_path=s3://, sample_type=WES, validate=False, dataset_type=VARIANTS, remap_path=, subset_path=)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/luigi/", line 199, in run
    new_deps = self._run_get_new_deps()
  File "/usr/local/lib/python3.7/site-packages/luigi/", line 141, in _run_get_new_deps
    task_gen =
  File "/home/hadoop/hail-elasticsearch-pipelines/luigi_pipeline/", line 51, in run
  File "/home/hadoop/hail-elasticsearch-pipelines/luigi_pipeline/", line 54, in read_vcf_write_mt
    mt = self.import_vcf()
  File "/home/hadoop/hail-elasticsearch-pipelines/luigi_pipeline/lib/", line 105, in import_vcf
    force_bgz=True, min_partitions=500, array_elements_required=self.array_elements_required)
  File "<decorator-gen-1316>", line 2, in import_vcf
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/", line 576, in wrapper
    args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/", line 543, in check_all
    args_.append(arg_check(args[i], name, arg_name, checker))
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/", line 584, in arg_check
    return checker.check(arg, function_name, arg_name)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/", line 82, in check
    return tc.check(x, caller, param)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/", line 328, in check
    return f(tc.check(x, caller, param))
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/genetics/", line 10, in <lambda>
    reference_genome_type = oneof(transformed((str, lambda x: hl.get_reference(x))), rg_type)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/", line 554, in get_reference
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/utils/", line 55, in hc
  File "<decorator-gen-1658>", line 2, in init
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/typecheck/", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/", line 252, in init
    skip_logging_configuration, optimizer_iterations)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/backend/", line 163, in __init__
    self._utils_package_object = scala_package_object(hail_package.utils)
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/utils/", line 122, in scala_package_object
    return scala_object(jpackage, 'package')
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/utils/", line 118, in scala_object
    return getattr(getattr(jpackage, name + '$'), 'MODULE$')
  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1644, in __getattr__
    raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
py4j.protocol.Py4JError: is.hail.utils.package$ does not exist in the JVM

It fails when import_vcf is run:

hl.import_vcf([vcf_file for vcf_file in self.source_paths],
                             reference_genome='GRCh' + self.genome_version,
                             force_bgz=True, min_partitions=500, 

What version of Spark are you using? I think between 0.2.57 and 0.2.70 we updated from Spark2 to Spark3 in PyPI artifacts.

1 Like

Ok, we realized that we were using Spark 2, so we updated it and the issue was resolved but we faced a different one:

  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1305, in __call__
    answer, self.gateway_client, self.target_id,
  File "/home/hadoop/.local/lib/python3.7/site-packages/hail/backend/", line 32, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None UnsupportedFileSystemException: No FileSystem for scheme "s3"
Java stack trace:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

When we ran to check if s3 is supported by Hail it gave us:

[hadoop@ip-4234 ~]$ python3
Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hail
>>> hail.utils.hadoop_scheme_supported('s3')
Initializing Hail with default parameters...
2021-06-24 20:10:39 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.2
SparkUI available at http://ip-34453.ec2.internal:2323
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.70-5bb98953a4a7
LOGGING: writing to /home/hadoop/hail-20210624-2010-0.2.70-5bb98953a4a7.log

What’s your runtime? Are you running locally and reading from S3?

Previously it was working just fine with 0.2.57, so I am not sure its related to some kind of an access issue. Its AWS EMR - emr-6.3.0 which accesses s3. But we can log in to cluster and work locally with it also.

How did you update Spark? Did you use a different EMR image/version? It looks like you have pyspark installed with pip:

  File "/usr/local/lib/python3.7/site-packages/pyspark/python/lib/", line 1305, in __call__

You might try sshing into the driver node, running pip3 uninstall pyspark -y and see if that fixes things by using the EMR spark installation

1 Like