Error in calling vcf_combiner

iwong · July 8, 2021, 12:21pm

Hi!

I am running into an java.lang.ArrayIndexOutOfBoundsException error when trying to call hail’s gvcf combiner (hail.experimental.run_combiner). I’ve attached the full error file below. Do you have any suggestions or ideas on where to go from here as far as diagnosing what might be causing the error? My input files appear to be properly formatted as far as I can tell.

Here is a rough copy of my code

import hail as hl

# ....................#
## Parse user inputs ##
# ....................#

    hl.experimental.run_combiner(
        gvcf_list,
        sample_names=samples_list,
        header=args.gvcf_header_file,
        out_file=args.output_cloud_path,
        tmp_path=args.tmp_bucket,
        key_by_locus_and_alleles=True,
        overwrite=args.overwrite_existing,
        reference_genome='GRCh38',
        use_exome_default_intervals=True,
        target_records=10000
    )

Error Message:

Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.61-3c86d3ba497a
LOGGING: writing to /home/hail/combiner.log
2021-06-21 18:37:37 Hail: INFO: Using 65 intervals with default exome size 60000000 as partitioning for GVCF import
2021-06-21 18:37:37 Hail: INFO: GVCF combiner plan:
    Branch factor: 100
    Batch size: 100
    Combining 5 input files in 1 phases with 1 total jobs.
        Phase 1: 1 job corresponding to 1 final output file.

2021-06-21 18:37:37 Hail: INFO: Starting phase 1/1, merging 5 input GVCFs in 1 job.
2021-06-21 18:37:37 Hail: INFO: Starting phase 1/1, job 1/1 to create 1 merged file, corresponding to ~100.0% of total I/O.

[Stage 0:>                                                         (0 + 8) / 65]
[Stage 0:>                                                        (0 + 15) / 65]
[Stage 0:>                                                         (0 + 8) / 65]
[Stage 0:>                                                        (0 + 12) / 65]Traceback (most recent call last):
  File "/tmp/a335abd27f1041da8eaffc174c60366b/test_combiner.py", line 38, in <module>
    target_records=10000
  File "/opt/conda/default/lib/python3.6/site-packages/hail/experimental/vcf_combiner/vcf_combiner.py", line 681, in run_combiner
    final_mt.write(out_file, overwrite=overwrite)
  File "<decorator-gen-1231>", line 2, in write
  File "/opt/conda/default/lib/python3.6/site-packages/hail/typecheck/check.py", line 614, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/opt/conda/default/lib/python3.6/site-packages/hail/matrixtable.py", line 2528, in write
    Env.backend().execute(ir.MatrixWrite(self._mir, writer))
  File "/opt/conda/default/lib/python3.6/site-packages/hail/backend/py4j_backend.py", line 98, in execute
    raise e
  File "/opt/conda/default/lib/python3.6/site-packages/hail/backend/py4j_backend.py", line 74, in execute
    result = json.loads(self._jhc.backend().executeJSON(jir))
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/conda/default/lib/python3.6/site-packages/hail/backend/py4j_backend.py", line 32, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None
hail.utils.java.FatalError: ArrayIndexOutOfBoundsException: null

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 20 times, most recent failure: Lost task 2.19 in stage 0.0 (TID 145, test-w-1.c.strokeanderson-hail.internal, executor 2): java.lang.ArrayIndexOutOfBoundsException

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1892)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1880)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1879)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2113)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2062)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2051)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:990)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:989)
	at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:166)
	at is.hail.rvd.RVD.writeRowsSplit(RVD.scala:952)
	at is.hail.expr.ir.MatrixValue.write(MatrixValue.scala:246)
	at is.hail.expr.ir.MatrixNativeWriter.apply(MatrixWriter.scala:61)
	at is.hail.expr.ir.WrappedMatrixWriter.apply(MatrixWriter.scala:40)
	at is.hail.expr.ir.Interpret$.run(Interpret.scala:825)
	at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)
	at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:15)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:13)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:14)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:12)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)
	at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:354)
	at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:338)
	at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:335)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:25)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:23)
	at is.hail.utils.package$.using(package.scala:618)
	at is.hail.annotations.Region$.scoped(Region.scala:18)
	at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:23)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:247)
	at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:335)
	at is.hail.backend.spark.SparkBackend$$anonfun$7.apply(SparkBackend.scala:379)
	at is.hail.backend.spark.SparkBackend$$anonfun$7.apply(SparkBackend.scala:377)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:377)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

java.lang.ArrayIndexOutOfBoundsException: null

tpoterba · July 13, 2021, 6:36pm

This is a super weird stack trace. Where are you running this code? Local/Cloud? I think it would also help to update to the latest Hail version so line numbers are current for our debugging purposes.

iwong · July 15, 2021, 8:37pm

I am running this through the Broad’s Terra platform. I updated to the latest hail version, here is the new error log file. The code is still the same as before.

Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.71-f3a54b530979
LOGGING: writing to /home/hail/combiner.log

2021-07-15 20:27:55 Hail: INFO: Using 65 intervals with default exome size 60000000 as partitioning for GVCF import
2021-07-15 20:27:55 Hail: INFO: GVCF combiner plan:
    Branch factor: 100
    Phase 1 batch size: 100
    Combining 5 input files in 1 phases with 1 total jobs.
        Phase 1: 1 job corresponding to 1 final output file.

2021-07-15 20:27:55 Hail: INFO: Starting phase 1/1, merging 5 input GVCFs in 1 job.
2021-07-15 20:27:55 Hail: INFO: Starting phase 1/1, job 1/1 to create 1 merged file, corresponding to ~100.0% of total I/O.

[Stage 0:>                                                         (0 + 8) / 65]
Traceback (most recent call last):
  File "/tmp/6fb884c676ca495084aafbe35adbf283/test_combiner.py", line 36, in <module>
    hl.experimental.run_combiner(
  File "/opt/conda/default/lib/python3.8/site-packages/hail/experimental/vcf_combiner/vcf_combiner.py", line 705, in run_combiner
    final_mt.write(out_file, overwrite=overwrite)
  File "<decorator-gen-1237>", line 2, in write
  File "/opt/conda/default/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/opt/conda/default/lib/python3.8/site-packages/hail/matrixtable.py", line 2529, in write
    Env.backend().execute(ir.MatrixWrite(self._mir, writer))
  File "/opt/conda/default/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 98, in execute
    raise e
  File "/opt/conda/default/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 74, in execute
    result = json.loads(self._jhc.backend().executeJSON(jir))
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
  File "/opt/conda/default/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 30, in deco
    raise FatalError('%s\n\nJava stack trace:\n%s\n'
hail.utils.java.FatalError: ArrayIndexOutOfBoundsException: null

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 20 times, most recent failure: Lost task 0.19 in stage 0.0 (TID 166) (test-w-0.c.strokeanderson-hail.internal executor 1): java.lang.ArrayIndexOutOfBoundsException

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2254)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2203)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2202)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2202)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1078)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1078)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1078)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2441)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2383)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2372)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2202)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2223)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2242)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2267)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
	at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:176)
	at is.hail.rvd.RVD.writeRowsSplit(RVD.scala:978)
	at is.hail.expr.ir.MatrixValue.write(MatrixValue.scala:257)
	at is.hail.expr.ir.MatrixNativeWriter.apply(MatrixWriter.scala:67)
	at is.hail.expr.ir.WrappedMatrixWriter.apply(MatrixWriter.scala:45)
	at is.hail.expr.ir.Interpret$.run(Interpret.scala:790)
	at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:56)
	at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)
	at is.hail.expr.ir.InterpretNonCompilable$.rewrite$1(InterpretNonCompilable.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:15)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:15)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:12)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:14)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:12)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:29)
	at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
	at is.hail.backend.spark.SparkBackend.$anonfun$execute$1(SparkBackend.scala:365)
	at is.hail.expr.ir.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
	at is.hail.utils.package$.using(package.scala:627)
	at is.hail.expr.ir.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
	at is.hail.utils.package$.using(package.scala:627)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:46)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
	at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:362)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeJSON$1(SparkBackend.scala:406)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:404)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

java.lang.ArrayIndexOutOfBoundsException: null
	at 




Hail version: 0.2.71-f3a54b530979
Error summary: ArrayIndexOutOfBoundsException: null

[Stage 0:>                                                         (0 + 1) / 65]
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [6fb884c676ca495084aafbe35adbf283] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/6fb884c676ca495084aafbe35adbf283?project=strokeanderson-hail&region=us-central1
gcloud dataproc jobs wait '6fb884c676ca495084aafbe35adbf283' --region 'us-central1' --project 'strokeanderson-hail'
https://console.cloud.google.com/storage/browser/dataproc-staging-us-central1-538833006791-3bmpl5lg/google-cloud-dataproc-metainfo/8de1d632-8d33-4726-b7e4-9c5881b14378/jobs/6fb884c676ca495084aafbe35adbf283/
gs://dataproc-staging-us-central1-538833006791-3bmpl5lg/google-cloud-dataproc-metainfo/8de1d632-8d33-4726-b7e4-9c5881b14378/jobs/6fb884c676ca495084aafbe35adbf283/driveroutput
Submitting to cluster 'test'...
gcloud command:
gcloud dataproc jobs submit pyspark /test_combiner.py \
    --files=gs://iw-hail-anderson-strokes-test/000-hail/strokes_sample_map_test.tsv \
    --py-files=/cromwell_root/tmp.f943c555/pyscripts_fn3j9zcj.zip \
    --properties= \
    -- \
    -g \
    gs://iw-hail-anderson-strokes-test/000-hail/header.g.vcf.gz \
    -s \
    gs://iw-hail-anderson-strokes-test/000-hail/strokes_sample_map_test.tsv \
    -c \
    gs://iw-hail-anderson-strokes-test/000-hail/andersoncallset.mt \
    -t \
    gs://iw-hail-anderson-strokes-test/000-hail//tmp_20210715202734 \
    -o
Traceback (most recent call last):
  File "/usr/local/bin/hailctl", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/hailtop/hailctl/__main__.py", line 100, in main
    cli.main(args)
  File "/usr/local/lib/python3.6/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
    jmp[args.module].main(args, pass_through_args)
  File "/usr/local/lib/python3.6/site-packages/hailtop/hailctl/dataproc/submit.py", line 78, in main
    gcloud.run(cmd)
  File "/usr/local/lib/python3.6/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
    return subprocess.check_call(["gcloud"] + command)
  File "/usr/local/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'dataproc', 'jobs', 'submit', 'pyspark', '/test_combiner.py', '--cluster=test', '--files=gs://iw-hail-anderson-strokes-test/000-hail/strokes_sample_map_test.tsv', '--py-files=/cromwell_root/tmp.f943c555/pyscripts_fn3j9zcj.zip', '--properties=', '--', '-g', 'gs://iw-hail-anderson-strokes-test/000-hail/header.g.vcf.gz', '-s', 'gs://iw-hail-anderson-strokes-test/000-hail/strokes_sample_map_test.tsv', '-c', 'gs://iw-hail-anderson-strokes-test/000-hail/andersoncallset.mt', '-t', 'gs://iw-hail-anderson-strokes-test/000-hail//tmp_20210715202734', '-o']' returned non-zero exit status 1.
2021/07/15 20:28:14 Starting delocalization.
2021/07/15 20:28:15 Delocalization script execution started...
2021/07/15 20:28:15 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-ea8c20a8-36cb-48df-a51d-f96205adc39b/ee570eb1-9589-4f88-b2d9-ee51b7a78c3c/CallsetWithWdl/ccd2f002-8b91-4d9a-874b-d8cc3823c546/call-CreateMatrixTable/memory_retry_rc
2021/07/15 20:28:16 Delocalizing output /cromwell_root/rc -> gs://fc-ea8c20a8-36cb-48df-a51d-f96205adc39b/ee570eb1-9589-4f88-b2d9-ee51b7a78c3c/CallsetWithWdl/ccd2f002-8b91-4d9a-874b-d8cc3823c546/call-CreateMatrixTable/rc
2021/07/15 20:28:17 Delocalizing output /cromwell_root/stdout -> gs://fc-ea8c20a8-36cb-48df-a51d-f96205adc39b/ee570eb1-9589-4f88-b2d9-ee51b7a78c3c/CallsetWithWdl/ccd2f002-8b91-4d9a-874b-d8cc3823c546/call-CreateMatrixTable/stdout
2021/07/15 20:28:19 Delocalizing output /cromwell_root/stderr -> gs://fc-ea8c20a8-36cb-48df-a51d-f96205adc39b/ee570eb1-9589-4f88-b2d9-ee51b7a78c3c/CallsetWithWdl/ccd2f002-8b91-4d9a-874b-d8cc3823c546/call-CreateMatrixTable/stderr
2021/07/15 20:28:20 Delocalization script execution complete.
2021/07/15 20:28:22 Done delocalization.

tpoterba · July 16, 2021, 1:52pm

I am running this through the Broad’s Terra platform.

This means running in a notebook using the Terra notebook runtime, right? Using a Dataproc cluster as the execution system?

@chrisvittal can you take over from here? I don’t have any good ideas at the moment.

iwong · July 16, 2021, 2:12pm

I’m running through the WDL/cromwell wrokflow route and not the dedicated notebook environment but I think they’re the same thing. The workflow relies on a dataproc cluster, yes.

tpoterba · July 16, 2021, 2:14pm

Can you run anything in that runtime? Like how about this script:

mt = hl.balding_nichols_model(n_populations=3, n_samples=1000, n_variants=1000, n_partitions=64)
mt = hl.variant_qc(mt)
mt = hl.sample_qc(mt)
mt._force_count_rows()

iwong · July 16, 2021, 2:49pm

Yes, that script appears to run fine as far as I can tell

iwong · July 21, 2021, 2:31pm

My issue appeared similar to the issue posted here: Cromwell Retry with More Memory feature false failures – Terra Support
The poster’s fix of using --driver-log-levels root=WARN in the gcloud dataproc submit call did not work for me.

Running the same code in a Terra Notebook also generated the same error message.

chrisvittal · July 21, 2021, 6:07pm

If possible, can you just submit your script to a hail cluster (created directly with hailctl dataproc) and see if that is any different?

iwong · July 28, 2021, 12:14pm

I’ve tried submitting directly to a hail cluster as well as seeing if the problem was with my VCFs by trying all pair combinations of the five VCFs I was using and am still running into the same error as above.

tpoterba · July 28, 2021, 12:44pm

to confirm – you started a cluster with hailctl dataproc start, then submitted the script to that cluster, and it crashed with the same message?

iwong · July 28, 2021, 12:45pm

yes

tpoterba · July 28, 2021, 12:46pm

Great, thanks. Er, one more question - what version of hailctl/Hail did you use for that?

iwong · July 28, 2021, 12:47pm

0.2.74-0c3a74d12093

tpoterba · July 28, 2021, 12:49pm

Latest release, awesome. @cdv maybe next step is connecting with Isaac to watch the Spark UI when the job fails, grabbing executor logs and such?

Topic		Replies	Views
Trouble with vcf_combiner on gcloud dataproc cluster Hail Query & hailctl	2	326	October 22, 2021
ArrayIndexOutOfBoundsException with run_combiner Hail Query & hailctl	5	481	May 3, 2021
Hl.experimental.run_combiner() AssertionError Hail Query & hailctl	11	586	July 16, 2020
ArrayIndexOutOfBoundsException Hail Query & hailctl	22	1231	November 21, 2019
Possible vcf_combiner issue Hail Query & hailctl	19	1245	June 15, 2020

Error in calling vcf_combiner

Related topics