We are trialing a new on-prem hail cluster. We have an old on-prem cluster which doesn’t display the problem, and we thought we had replicated the provisioning, but we clearly missed something:
We are finding that writing an mt works fine, but exporting is failing, and we can’t work out what to look / tweak to begin to diagnose the problem - any advice appreciated.
Note: This problem happens regardless of whether we’re trying to export tables or matrixtables.
Details below.
import
mt = hl.import_vcf(“s3a://ddd-elgh/chr21.test.vcf.gz”, force_bgz=True)
2019-05-24 15:15:10 Hail: WARN: expected input file `s3a://ddd-elgh/chr21.test.vcf.gz’ to end in .vcf[.bgz, .gz]
write mt works fine
mt.write(“s3a://ddd-elgh/test.mt”, overwrite=True)
2019-05-24 15:15:14 Hail: INFO: Coerced sorted dataset
2019-05-24 15:15:25 Hail: INFO: wrote matrix table with 547 rows and 12644 columns in 2 partitions to s3a://ddd-elgh/test.mt
#export to vcf blows up
hl.export_vcf(mt, “s3a://ddd-elgh/test.vcf”)
FatalError Traceback (most recent call last)
in
----> 1 hl.export_vcf(mt, “s3a://ddd-elgh/test.vcf”)
…
FatalError: HailException: Expected 2 part files but found 0
Java stack trace:
is.hail.utils.HailException: Expected 2 part files but found 0
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
at is.hail.utils.package$.fatal(package.scala:28)
at is.hail.utils.richUtils.RichHadoopConfiguration$.copyMerge$extension(RichHadoopConfiguration.scala:178)
at is.hail.utils.richUtils.RichRDD$.writeTable$extension(RichRDD.scala:84)
at is.hail.io.vcf.ExportVCF$.apply(ExportVCF.scala:466)
at is.hail.expr.ir.MatrixVCFWriter.apply(MatrixWriter.scala:37)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:751)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:87)
at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:31)
at is.hail.backend.spark.SparkBackend$.execute(SparkBackend.scala:49)
at is.hail.backend.spark.SparkBackend$.executeJSON(SparkBackend.scala:16)
at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)