Repartition on read and AssertionError

Hi hail team!

I’m trying to repartition a large Table (written with only 150 partitions) to 30000 partitions. However, I seem to run into AssertionErrors trying to use this Table after repartitioning on read. For example, running this:

ht = hl.read_table('gs://regional_missense_constraint/temp/mpc_temp_gnomad.ht', _n_partitions=30000)
ht.show()

produces this error:

FatalError: AssertionError: assertion failed

Java stack trace:
java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:208)
	at is.hail.rvd.IndexedRVDSpec2.readTableStage(AbstractRVDSpec.scala:542)
	at is.hail.expr.ir.TableNativeReader.lower(TableIR.scala:1039)
	at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:706)
	at is.hail.expr.ir.lowering.LowerTableIR$.lower$2(LowerTableIR.scala:686)
	at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:1551)
	at is.hail.expr.ir.lowering.LowerTableIR$.lower$2(LowerTableIR.scala:686)
	at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:944)
	at is.hail.expr.ir.lowering.LowerTableIR$.lower$2(LowerTableIR.scala:686)
	at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:1142)
	at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:458)
	at is.hail.expr.ir.lowering.LowerTableIR$.apply(LowerTableIR.scala:537)
	at is.hail.expr.ir.lowering.LowerToCDA$.lower(LowerToCDA.scala:69)
	at is.hail.expr.ir.lowering.LowerToCDA$.apply(LowerToCDA.scala:18)
	at is.hail.expr.ir.lowering.LowerToDistributedArrayPass.transform(LoweringPass.scala:77)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:27)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
	at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
	at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
	at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:416)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:452)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:69)
	at is.hail.utils.package$.using(package.scala:640)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:69)
	at is.hail.utils.package$.using(package.scala:640)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:58)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:310)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:449)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:448)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)



Hail version: 0.2.95-513139587f57
Error summary: AssertionError: assertion failed

Running this, however, works fine:

ht = hl.read_table('gs://regional_missense_constraint/temp/mpc_temp_gnomad.ht')
ht.show()

Log:
repartition_test.log (770.1 KB)

Maybe this error is unrelated to the repartitioning? I’d appreciate any insight.

Thanks in advance!

does the table have no key? Looks like _n_partitions will only work if there’s a key.

(looks like that’s the assertion error being hit)

ugh yes I accidentally unkeyed it. thanks!