I am trying to run a basic script on spark cluster that takes in a file, converts it and outputs in different format. The spark cluster at the moment consists of 1 master and 1 slave both running on the same node. The full command is:
nohup spark-submit --master spark://tr-nodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar --conf spark.executor.extraClassPath=./hail-all-spark.jar ./hail_scripts/v02/convert_vcf_to_hail.py /clinvar_37.vcf -ht --genome-version 37 --output /seqr-reference-hail2/clinvar_37.ht &
And it fails with the following error:
Traceback (most recent call last):
File “/opt/seqr/hail-elasticsearch-pipelines/./hail_scripts/v02/convert_vcf_to_hail.py”, line 38, in
mt = import_vcf(vcf_path, args.genome_version, force_bgz=True, min_partitions=10000, drop_samples=True)
File “/opt/seqr/hail-elasticsearch-pipelines/hail_scripts/v02/utils/hail_utils.py”, line 71, in import_vcf
skip_invalid_loci=skip_invalid_loci)
File “</opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/decorator.py:decorator-gen-1246>”, line 2, in import_vcf
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/methods/impex.py”, line 2106, in import_vcf
return MatrixTable(MatrixRead(reader, drop_cols=drop_samples))
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/matrixtable.py”, line 557, in init
self._type = self._mir.typ
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/ir/base_ir.py”, line 328, in typ
self._compute_type()
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/ir/matrix_ir.py”, line 60, in _compute_type
self._type = Env.backend().matrix_type(self)
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/backend/backend.py”, line 124, in matrix_type
jir = self._to_java_ir(mir)
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/backend/backend.py”, line 105, in _to_java_ir
ir._jir = ir.parse(r(ir), ir_map=r.jirs)
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/ir/base_ir.py”, line 336, in parse
return Env.hail().expr.ir.IRParser.parse_matrix_ir(code, ref_map, ir_map)
File “/opt/seqr/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/utils/java.py”, line 225, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: IllegalStateException: unread block data
I looked online and found the thread:
Using the suggestion I exported JAVA_HOME environment variable in ‘spark-env.sh’ and restarted the cluster but it did not help.
Such command works fine:
spark-submit --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar --conf spark.executor.extraClassPath=./hail-all-spark.jar ./hail_scripts/v02/convert_vcf_to_hail.py /hgmd_pro_2019.3_hg19_noDB.vcf -ht --genome-version 37 --output /seqr-reference-hail2/hgmd_2019.3_hg19_noDB.ht
When it just runs in the terminal (local mode, I guess). So, I suspect that the main issue here is that driver and slave differ in Java somehow. How could it be fixed since just setting JAVA_HOME does not work?