The directory item limit of /tmp/aggregate_intermediates is exceeded: limit=1048576 items=1048576

Hello Hail Team,

I am in the process of running our hail-based sample quality control script on roughly 470,000 exomes on the UK Biobank Research Analysis Platform. While running a test, my job failed due to the following error:

hail.utils.java.FatalError: RemoteException: The directory item limit of /tmp/aggregate_intermediates is exceeded: limit=1048576 items=1048576
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1277)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:1361)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1184)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.addFile(FSDirWriteFileOp.java:579)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:398)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2703)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2596)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:799)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:494)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)

I saw this thread indicating that the individual solved this error by “manually removing the contents of aggregate_intermediates”. What exactly does that mean, and can you provide deeper insight into how that can be done? Thank you in advance.

Hey @mgarcia !

You could manually issue an rm -rf /tmp/aggregate_intermediates but I think an easier, more scalable, and durable fix is to set your tmp_dir to a S3 bucket: hl.init(tmp_dir='s3://...'). Or, perhaps, a DNANexus bucket/table? I’m not too familiar with the DNANexus platform.