I need to re-run a densify script (300K samples; entries: “GT”, “GQ”, “DP”, “adj”, “END”, “AD”) and ran into this error:
ail.utils.java.FatalError: RemoteException: Cannot create file/tmp/table-map-rows-scan-aggs-part-EvP3J35BxG2Ex3gb6iRitu. Name node is in safe mode.
Resources are low on NN. Please add or free up more resourcesthen turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. NamenodeHostName:kc-m
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1413)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1400)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2284)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2230)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:745)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
Java stack trace:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/tmp/table-map-rows-scan-aggs-part-EvP3J35BxG2Ex3gb6iRitu. Name node is in safe mode.
Resources are low on NN. Please add or free up more resourcesthen turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. NamenodeHostName:kc-m
Any ideas what this means? I’ve switched back to 0.2.40 and restarted the job for now.
I’ll send the script, log, and diagnostic tar file (run after the job failed) via email. Thanks for all the help with my densify issues!
I’d just kick up the number of non-preemptibles to 10 or so. If that doesn’t work, we can also increase their disk size. I think what’s going on here is that the densify scan intermediate is stored on HDFS, and that’s getting full and killing the job. Increasing non-preemptibles from 2 to 10 will increase HDFS space by 5x.
thanks! do you think it’s better to restart the job with 10 workers on 0.2.49 or keep the 0.2.40 job running? my 0.2.40 job has only been running for about an hour.
I’ll probably keep the 0.2.40 job running, but I’ll try adding some workers to a 0.2.49 cluster next time, since I should be running another densify relatively soon. Thanks for the super fast responses!