Vds.write error

jerryliu2005 · September 21, 2017, 7:28pm

Hello,

I got the following error trying to write out the vds. The error summary is at the end. Could you suggest how to fix it? Many thanks!
Jerry

vds.write(“all.vds”)

Errors <<<<<
Hail version: 0.1-38882df
Error summary: RemoteException: File all.vds/rdd.parquet/_temporary/0/_temporary/attempt_20170921152120_0039_m_000087_3/part-00087-ddf85ff9-63af-45f8-8597-52f73dbd7dfc.snappy.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 9 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3325)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:679)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:489)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

tpoterba · September 21, 2017, 7:29pm

Looks like you’re writing to an HDFS system that’s full or down. What’s your cluster setup look like?

jerryliu2005 · September 21, 2017, 7:32pm

Thanks very much for your quick reply. We are using a Cloudera cluster here. I will pass your opinion to our IT to look into it.

tpoterba · September 21, 2017, 7:35pm

To fix the issue for now, you can write to NFS instead of HDFS. The default file scheme is HDFS, so you’ll need to prefix the path with file:// to write to NFS. This’ll look something like file:///path/to/vds...

jerryliu2005 · September 21, 2017, 7:48pm

I tried that but got IOException of Mkdirs failure. I will ask our IT about it. Thanks very much again!

tpoterba · September 21, 2017, 7:49pm

That’s probably a permission issue or a mistyped path - you should be able to write to NFS. Make sure you have three forward slashes in front of file:

jerryliu2005 · September 21, 2017, 7:53pm

I tried with 2 fwd slashes plus the full path that starts with 1 "/. " I’ll have our IT look into it. Thx!

tpoterba · September 21, 2017, 7:53pm

Got it, best of luck!

jerryliu2005 · September 22, 2017, 2:16pm

Apparently our local NFS on the data nodes are allocated too small. Our HDFS is working. hdfs df also says only 5% of a total of 390TB is used. I had no problem writing vds out with chr20 (18GB vcf.bgz input size) but the current input size is 931GB (~3900 samples WGS). When you talked about HDFS full problem, are you referring to the whole HDFS or the HDFS path I’m trying to write to? Our current set up for Spark cluster is 1 gateway node, 3 management node, 9 data nodes with 32cpu/177GB Mem/44TB hdfs storage for each node. Do you think this is sufficient for the hail task I’m doing?

Thanks!

jerryliu2005 · September 22, 2017, 8:50pm

Just to add that when I started the pyspark, I followed the tutorial for Cloudera cluster:
pyspark2 --jars build/libs/hail-all-spark.jar
–py-files build/distributions/hail-python.zip
–conf spark.sql.files.openCostInBytes=1099511627776
–conf spark.sql.files.maxPartitionBytes=1099511627776
–conf spark.hadoop.parquet.block.size=1099511627776

Do I need to adjust the pqrquet.block.size or just leave it out?

tpoterba · September 22, 2017, 9:28pm

I’m not entirely sure. It’s certainly safe to leave it in. I think Hail will error out at the construction of a HailContext if the Spark Context isn’t properly configured.

tpoterba · September 22, 2017, 9:29pm

If there was HDFS space left, then the “could not be replicated to min number of data nodes” issue could be something else. Hmm…

jerryliu2005 · September 23, 2017, 11:48am

Just to report back in case it’s useful for others.
I tested leaving the spark.hadoop.parquet.block.size option out while starting Hail and it works. According to our IT the default parquet.block.size is set at 128MB. vds was written successfully and tested to be valid.

jerryliu2005 · September 24, 2017, 12:39am

A minor correction - Our default parquet.block.size turns out to be 1G (1073741824 precisely) not 128MB. It works fine under the default.

tpoterba · September 26, 2017, 9:45am

Interesting. In this case I think that the parquet.block.size parameter may be being ignored / overruled. We need to read each parquet file as one Spark partition due to the on-disk ordering system we’ve built, and so use other config options to ensure that Parquet files are never split.

This should get a bit simpler in the next stable version!

jerryliu2005 · September 26, 2017, 1:07pm

That’s what I thought, too. Just out of curiosity, what is the Spark partition size required for Hail, is it 1G (1073741824)? The parquet.block.size parameter on the Hail tutorial is set to be 1TB. I am thinking maybe when I was trying to write vds out somehow hail pre-calculate the needed HDFS storage based on #partitions * block size * replication factor, causing it to give error of not enough space. Could that be the case?

Topic		Replies	Views
Data Node Error Hail Query & hailctl	1	332	January 7, 2021
Not able to write to vds Help [0.1]	9	2114	September 1, 2017
Got error when writing vds with parquet_genotypes parameter Help [0.1]	2	740	September 8, 2017
Out of Space when writting VDS Help [0.1]	6	913	September 5, 2017
Java error when trying to write a VDS Help [0.1]	8	1191	October 27, 2017

Vds.write error

Related topics