Looking a bit closer, I bet that what’s happened is the Hail jar isn’t visible to the executors. I definitely want to see the rest of the error message if it exists though!
That’s the only error message I see back on the console. However, I have found the hail.log, and within that I see an error for changing permissions :
The command - vds.write(‘/mnt/smb/giannoulatou_lab/Eddie_Ip/working/software/hail_data/all_17102017_vqsr.normalized.vds’)
The error in the hail.log
2017-10-26 10:42:20 root: ERROR: ExitCodeException: chmod: changing permissions of ‘/mnt/smb/giannoulatou_lab/Eddie_Ip/working/software/hail_data/all_17102017_vqsr.normalized.vds/metadata.json.gz’: Operation not permitted
From org.apache.hadoop.util.Shell$ExitCodeException: chmod: changing permissions of ‘/mnt/smb/giannoulatou_lab/Eddie_Ip/working/software/hail_data/all_17102017_vqsr.normalized.vds/metadata.json.gz’: Operation not permitted
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at is.hail.utils.richUtils.RichHadoopConfiguration$.is$hail$utils$richUtils$RichHadoopConfiguration$$create$extension(RichHadoopConfiguration.scala:22)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeTextFile$extension(RichHadoopConfiguration.scala:243)
at is.hail.variant.VariantSampleMatrix.writeMetadata(VariantSampleMatrix.scala:2118)
at is.hail.variant.VariantDatasetFunctions$.write$extension(VariantDataset.scala:730)
at is.hail.variant.VariantDatasetFunctions.write(VariantDataset.scala:722)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
I changed the location of where I’m writing the vds to a local directory in my system and the write now completes with no error, vds (folder) created.
All those directories are owned by root. According to the chmod man page:
Only the owner of a file or the super-user is permitted to change the mode of a file.
I’m fairly confident that your Spark workers are not running as the root user (this is not a good idea anyway). Those directories should really be owned by whichever user your Spark jobs are running as. I think that’s usually your username (you can check your username with whoami). If your username doesn’t work, ask whomever administrates your Spark cluster what user Spark jobs run as.
Aside: Spark really shouldn’t set the file mode like this. Unfortunately, Spark does not make this easy for us to change. I’ll look into mitigation strategies. Sorry about this.