Error Manhattan plot in AWS (Spark 2.3)


#1

Hi Hail team,

I have an issue with the hail.plot.manhattan function only in AWS.

I used the same version of Hail 0.2-e08cc2a17c4a (one of the latest) in GCP and in AWS.The only difference is that I am using Apache Spark version 2.2.1 in GCP and Spark version 2.3 in AWS.

In GCP, with the same exact script, I have no issues to have the manhattan plot.

The error is below :

Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/py4j-src.zip/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/py4j-src.zip/py4j/java_gateway.py", line 908, in send_command
    response = connection.send_command(command)
  File "/usr/lib/spark/python/lib/py4j-src.zip/py4j/java_gateway.py", line 1067, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
<ipython-input-9-d7d80f793aec> in <module>
----> 1 manhattan = hl.plot.manhattan(gwas.p_value)

<decorator-gen-1274> in manhattan(pvals, locus, title, size, hover_fields, collect_all, n_divisions, significance_line)

~/hail-python.zip/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    558     def wrapper(__original_func, *args, **kwargs):
    559         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 560         return __original_func(*args_, **kwargs_)
    561 
    562     return wrapper

~/hail-python.zip/hail/plot/plots.py in manhattan(pvals, locus, title, size, hover_fields, collect_all, n_divisions, significance_line)
    339         res = agg_f(aggregators.downsample(locus.global_position(), pvals,
    340                                            label=hail.array([hail.str(x) for x in hover_fields.values()]),
--> 341                                            n_divisions=n_divisions))
    342         fields = [point[2] for point in res]
    343         for idx, key in enumerate(list(hover_fields.keys())):

<decorator-gen-838> in aggregate(self, expr, _localize)

~/hail-python.zip/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    558     def wrapper(__original_func, *args, **kwargs):
    559         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 560         return __original_func(*args_, **kwargs_)
    561 
    562     return wrapper

~/hail-python.zip/hail/table.py in aggregate(self, expr, _localize)
   1141 
   1142         if _localize:
-> 1143             return Env.backend().execute(agg_ir)
   1144         else:
   1145             return construct_expr(agg_ir, expr.dtype)

~/hail-python.zip/hail/backend/backend.py in execute(self, ir)
     36         return ir.typ._from_json(
     37             Env.hail().expr.ir.Interpret.interpretJSON(
---> 38                 self._to_java_ir(ir)))
     39 
     40     def table_read_type(self, tir):

/usr/lib/spark/python/lib/py4j-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1158         answer = self.gateway_client.send_command(command)
   1159         return_value = get_return_value(
-> 1160             answer, self.gateway_client, self.target_id, self.name)
   1161 
   1162         for temp_arg in temp_args:

~/hail-python.zip/hail/utils/java.py in deco(*args, **kwargs)
    210         import pyspark
    211         try:
--> 212             return f(*args, **kwargs)
    213         except py4j.protocol.Py4JJavaError as e:
    214             s = e.java_exception.toString()

/usr/lib/spark/python/lib/py4j-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326             raise Py4JError(
    327                 "An error occurred while calling {0}{1}{2}".
--> 328                 format(target_id, ".", name))
    329     else:
    330         type = answer[1]

Py4JError: An error occurred while calling z:is.hail.expr.ir.Interpret.interpretJSON ```


Thanks for your help.

Ines

#2

I think that the JVM is running out of memory and dying.

How many variants?


#3

The problem is we need to be able to store the chr/pos/p-value information locally, and that can be big.


#4

The total number of variants of my MT file at the beginning was 219,154,452 variants.

But I am running the gwas only with 7,625,494 variants because I applied the filter:
mt = mt.filter_rows(mt.variant_qc.AF[1] > 0.05)

I used the same size of cluster on both cloud : 12 worker nodes (16 cores, 60-64 GB per core). And I checked, only about half of the memory is used during the analysis. So why it does not work on AWS when it works on GCP ?


#5

The important memory metric here is the driver machine memory - that’s what’s going OOM.

Is that different between your GCP and AWS runtimes?


#6

Indeed, it was the hardware (only 32GB for my nodes against 500GB for my worker nodes). I changed it and now it works on AWS.

Thank you !


#7

:+1: