Hi, I am exporting MT to vcf.bgz with GCP Dataproc (2 normal + 15 secondary nodes). This code worked well at several months ago but now it give me error. (same code, only changed input).
code I used:
import hail as hl
hl.init(default_reference='GRCh38')
mt = hl.read_matrix_table(mt_path)
hl.summarize_variants(mt)
hl.export_vcf(mt, 'gs://path/out.vcf.bgz')
Hail and Spark version I used and error I got:
Running on Apache Spark version 3.1.2
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.89-38264124ad91
Hail: WARN: export_vcf: ignored the following fields:
'variant_qc' (row)
Traceback (most recent call last): (9513 + 1089) / 28690]
File "/tmp/23d31cfb27854652b0c9c60754140170/step3.1.4_14062022.py", line 8, in <module>
hl.export_vcf(mt, 'gs://path/out.vcf.bgz')
File "<decorator-gen-1330>", line 2, in export_vcf
File "/opt/conda/default/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
return __original_func(*args_, **kwargs_)
File "/opt/conda/default/lib/python3.8/site-packages/hail/methods/impex.py", line 551, in export_vcf
Env.backend().execute(ir.MatrixWrite(dataset._mir, writer))
File "/opt/conda/default/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 110, in execute
raise e
File "/opt/conda/default/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 86, in execute
result_tuple = self._jhc.backend().executeEncode(jir, stream_codec)
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
File "/opt/conda/default/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 29, in deco
raise FatalError('%s\n\nJava stack trace:\n%s\n'
hail.utils.java.FatalError: RemoteException: File /tmp/write-table-concatenated-UiZi2iY4xWo4Q2mFeDyxeJ/_temporary/0/_temporary/attempt_20220614081406830712610524875364_0011_m_010536_19/part-10536.bgz could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2278)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2808)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:905)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
...
There are more error message, basically they repeat this kind of error:
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:162)
at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
org.apache.hadoop.ipc.RemoteException: File /tmp/write-table-concatenated-UiZi2iY4xWo4Q2mFeDyxeJ/_temporary/0/_temporary/attempt_20220614081406830712610524875364_0011_m_010536_19/part-10536.bgz could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
Any idea about what caused this and how I can solve it? Do I need to upgrade to a newer version of Hail to solve this? Thanks a lot for you time and any help are welcome.