Py4JNetworkError importing file

Hello,

I just installed hail 0.2 (conda create -n hail python=3.7; conda activate hail; pip install hail; conda install jupyterlab), and I’m having issues creating a table from gtf.

There is no problem creating the hail context, but then I get the following error:

gtf=hl.experimental.import_gtf('gencode.v30.all.gtf',reference_genome='GRCh38', skip_invalid_contigs=True)

2019-04-09 14:01:26 Hail: INFO: Reading table with no type imputation
  Loading column 'f0' as type 'str' (type not specified)
  Loading column 'f1' as type 'str' (type not specified)
  Loading column 'f2' as type 'str' (type not specified)
  Loading column 'f3' as type 'int32' (user-specified)
  Loading column 'f4' as type 'int32' (user-specified)
  Loading column 'f5' as type 'float64' (user-specified)
  Loading column 'f6' as type 'str' (type not specified)
  Loading column 'f7' as type 'int32' (user-specified)
  Loading column 'f8' as type 'str' (type not specified)

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving

Any ideia of what’s going on ? I also tested with my base conda environment and the error remains.
Thanks,
Pedro

is this the full stack trace? Is there a line of the import_gtf function that throws an error?

There is a step in that function that collects a potentially large amount of data, so it could be an OOM.

I guess there was, I can send you tomorrow.

Perhaps an important detail (?), I was in a Mac before. Now, in a linux machine the same code runs fine.

This is the full stack trace in my Mac

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
<ipython-input-2-ab21ca5d7853> in <module>
----> 1 gtf=hl.experimental.import_gtf('gencode.v30.all.gtf',reference_genome='GRCh38', skip_invalid_contigs=True)

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/experimental/import_gtf.py in import_gtf(path, reference_genome, skip_invalid_contigs, min_partitions)
    132                ht['attribute'].split('; '))))
    133 
--> 134     attributes = ht.aggregate(hl.agg.explode(lambda x: hl.agg.collect_as_set(x), ht['attribute'].keys()))
    135 
    136     ht = ht.transmute(**{x: hl.or_missing(ht['attribute'].contains(x),

</Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-915> in aggregate(self, expr, _localize)

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    559     def wrapper(__original_func, *args, **kwargs):
    560         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 561         return __original_func(*args_, **kwargs_)
    562 
    563     return wrapper

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/table.py in aggregate(self, expr, _localize)
   1133 
   1134         if _localize:
-> 1135             return Env.backend().execute(agg_ir)
   1136         else:
   1137             return construct_expr(agg_ir, expr.dtype)

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py in execute(self, ir)
     91         return ir.typ._from_json(
     92             Env.hail().backend.spark.SparkBackend.executeJSON(
---> 93                 self._to_java_ir(ir)))
     94 
     95     def value_type(self, ir):

~/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

~/anaconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py in deco(*args, **kwargs)
    213         import pyspark
    214         try:
--> 215             return f(*args, **kwargs)
    216         except py4j.protocol.Py4JJavaError as e:
    217             s = e.java_exception.toString()

~/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    334             raise Py4JError(
    335                 "An error occurred while calling {0}{1}{2}".
--> 336                 format(target_id, ".", name))
    337     else:
    338         type = answer[1]

Py4JError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.executeJSON

Yeah, that’s the line the collects the GTF keys locally. How big is the file? How many unique keys in the ‘attribute’ field?

The file refers to the latest Gencode annotation update (v30, including patches, scaffolds and haplotypes). It has 3,021,411 entries and apparently 16 unique attributes

cut -f9 gencode.v30.all.gtf | grep -v "^#" | sed -e $'s/; /\\\n/g' | cut -f1 -d " " | sort | uniq
ccdsid
exon_id
exon_number
gene_id
gene_name
gene_type
havana_gene
havana_transcript
level
ont
protein_id
tag
transcript_id
transcript_name
transcript_support_level
transcript_type

ok, that’s tiny – definitely not an OOM.

This is the first place in import_gtf that Hail will call into java to execute something, though, so maybe every execute will fail on your mac. maybe try something else, like:

hl.utils.range_table(10).show()

That’s true, it failed for that simple command.

how about pip show hail and pip show pyspark

Name: hail
Version: 0.2.12
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail-team@broadinstitute.org
License: UNKNOWN
Location: /Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages
Requires: matplotlib, parsimonious, seaborn, ipykernel, pandas, numpy, decorator, requests, pyspark, bokeh
Required-by: 
Name: pyspark
Version: 2.2.3
Summary: Apache Spark Python API
Home-page: https://github.com/apache/spark/tree/master/python
Author: Spark Developers
Author-email: dev@spark.apache.org
License: http://www.apache.org/licenses/LICENSE-2.0
Location: /Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages
Requires: py4j
Required-by: hail

and one more – pip show py4j

Name: py4j
Version: 0.10.7
Summary: Enables Python programs to dynamically access arbitrary Java objects
Home-page: https://www.py4j.org/
Author: Barthelemy Dagenais
Author-email: barthelemy@infobart.com
License: BSD License
Location: /Users/pbarbosa/anaconda3/envs/hail/lib/python3.7/site-packages
Requires: 
Required-by: pyspark

This is extremely weird. I assume that if you do:

$ pyspark

>>> spark.range(10).show()

it also errors?

No. I should mention this is an old mac with poor hardware capabilities

Python 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 15:43:19) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/04/10 19:07:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
d19/04/10 19:07:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
^R


19/04/10 19:07:34 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
19/04/10 19:07:35 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.3
      /_/

Using Python version 3.7.3 (default, Mar 27 2019 15:43:19)
SparkSession available as 'spark'.
>>> 
>>> 
>>> spark.range(10).show()
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
```

I’m pretty stumped.

How old is the operating system? Maybe the native libraries are to blame.

The operation system was recently updated (macOS sierra 10.12.16)
iMac (21.5-inch, Mid 2010)
Processor - 3,06 GHz Intel Core i3
Memory - 8 GB 1333 MHz DDR3
Graphics - ATI Radeon HD 4670 256 MB

Is there a hail log file (check the working directory)? Can you upload that?

Is there an hs_err_pid file in the working directory? Can you upload that as well?