Error when running count after filtering MT

Hello Hail team,

I am running into an error when running count after filtering on either columns or rows.

mt = get_gnomad_data('genomes', adj=True, release_annotations=True, split=True)
mt = hl.filter_intervals(mt, hl.experimental.get_gene_intervals(gene_symbols=['AP4B1','AP4E1', 'AP4M1', 'AP4S1']))
mt.rows().count()

will throw the error

FatalError: IllegalArgumentException: requirement failed

Java stack trace:
java.lang.IllegalArgumentException: requirement failed
	at scala.Predef$.require(Predef.scala:212)
	at is.hail.expr.ir.TableValue.<init>(TableValue.scala:47)
	at is.hail.expr.ir.TableNativeZippedReader.apply(TableIR.scala:245)
	at is.hail.expr.ir.TableRead.execute(TableIR.scala:295)
	at is.hail.expr.ir.TableFilterIntervals.execute(TableIR.scala:1714)
	at is.hail.expr.ir.Interpret$$anonfun$apply$2.apply$mcJ$sp(Interpret.scala:730)
	at is.hail.expr.ir.Interpret$$anonfun$apply$2.apply(Interpret.scala:730)
	at is.hail.expr.ir.Interpret$$anonfun$apply$2.apply(Interpret.scala:730)
	at scala.Option.getOrElse(Option.scala:121)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:730)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:89)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:59)
	at is.hail.expr.ir.InterpretNonCompilable$$anonfun$7.apply(InterpretNonCompilable.scala:19)
	at is.hail.expr.ir.InterpretNonCompilable$$anonfun$7.apply(InterpretNonCompilable.scala:19)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:19)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$2.apply(CompileAndEvaluate.scala:37)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$2.apply(CompileAndEvaluate.scala:37)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:24)
	at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:37)
	at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:55)
	at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:55)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:8)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:7)
	at is.hail.utils.package$.using(package.scala:596)
	at is.hail.annotations.Region$.scoped(Region.scala:18)
	at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)
	at is.hail.backend.Backend.execute(Backend.scala:55)
	at is.hail.backend.Backend.executeJSON(Backend.scala:61)
	at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)



Hail version: 0.2.24-9cd88d97bedd
Error summary: IllegalArgumentException: requirement failed

mt.count() will run if I do not filter to the gene intervals. However, if I filter to gnomad release samples using mt = mt.filter_cols(mt.meta.release). I see the same error. I am using version 0.2.24-9cd88d97bedd .

can we have the log file?

Sorry…I cannot for the life of me figure out how to attach a file on here. Google has failed me. How should I send it?

This is prehaps non-intuitive or just aggressively intuitive, but you should be able to drag and drop? I just fixed the file extensions to permit .log files.

foo.log (9 Bytes)

Yup now I can. Before it would only accept image extensions. Thank you.
gnomad_clinvar.log (403.8 KB)

aha:

(TableRead Table{global:Struct{},key:,row:Struct{s:String}} False “{"name":"TableNativeReader","path":"gs://gnomad/hardcalls/hail-0.2/mt/exomes/gnomad.exomes.mt/cols","_spec":{"name":"TableSpec","file_version":65536,"hail_version":"devel-a23032101373","references_rel_path":"…/references","table_type":"Table{global:Struct{},key:[s],row:Struct{s:String}}","components":{"globals":{"name":"RVDComponentSpec","rel_path":"…/globals/rows"},"rows":{"name":"RVDComponentSpec","rel_path":"rows"},"partition_counts":{"name":"PartitionCountsComponentSpec","counts":[164332]}}}}”))))))

This is a super old file. I bet it has the required-globals problem.

I’m fairly confident I back-patched (read: manually edited the metadata.json.gz) all the gnomAD files, so unless there’s another issue, I would think it should work.

Maybe try a different file (the exomes, or non-split hardcalls) to double check, maybe I missed one, but this one’s a pretty big workhorse, so I’d be surprised. mt = mt.select_globals() can also work to check if that’s the issue.

I fixed the error message here to give us more information. Can you try running on latest master?

(if you need help building from source, let me know)

mt = get_gnomad_data('exomes', adj=True, release_annotations=True, release_samples=True, split=True)
mt = mt.select_globals()
mt.count() 

Fails with the same error.

I attempted to build from source but received this error when running ./install-gcs-connector.sh

ERROR: (gcloud.iam.service-accounts.keys.create) RESOURCE_EXHAUSTED: Maximum number of keys on account reached.
'@type': type.googleapis.com/google.rpc.RetryInfo
  retryDelay: 86401s

I haven’t done this before so it is likely it was done incorrectly. However, when I initialize hail locally in the hail/hail directory I am running version 0.2.24-e3e63a2f9856. I think it’s just I can’t hook up to gcs?

This code works for me with my home-spun 9c44fc9e7c2b (or maybe 22f6defd17d who knows with my rig anymore, anyway relatively recent) so I don’t think it’s the old requiredness problem.

@mwilson I mean try on the cloud with a custom build – to do this you do (from hail/hail):

HAILCTL_BUCKET_BASE="gs://a-bucket-you-can-write-to" make install-hailctl

With or without mt = mt.select_globals(), I now see this error when running the code above. Would the full log be helpful?

FatalError: RuntimeException: globals mismatch:
typ: Struct{}
val: +Struct{}

    Java stack trace:
java.lang.RuntimeException: globals mismatch:
  typ: Struct{}
  val: +Struct{}
	at is.hail.expr.ir.TableValue.<init>(TableValue.scala:50)
	at is.hail.expr.ir.TableNativeReader.apply(TableIR.scala:181)
	at is.hail.expr.ir.TableRead.execute(TableIR.scala:295)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:748)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:89)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:59)
	at is.hail.expr.ir.InterpretNonCompilable$$anonfun$5.apply(InterpretNonCompilable.scala:16)
	at is.hail.expr.ir.InterpretNonCompilable$$anonfun$5.apply(InterpretNonCompilable.scala:16)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:16)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$2.apply(CompileAndEvaluate.scala:37)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$2.apply(CompileAndEvaluate.scala:37)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:24)
	at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:37)
	at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:57)
	at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:57)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:8)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:7)
	at is.hail.utils.package$.using(package.scala:596)
	at is.hail.annotations.Region$.scoped(Region.scala:18)
	at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)
	at is.hail.backend.Backend.execute(Backend.scala:57)
	at is.hail.backend.Backend.executeJSON(Backend.scala:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)



Hail version: 0.2.24-e3e63a2f9856
Error summary: RuntimeException: globals mismatch:
  typ: Struct{}
  val: +Struct{}

@konradjk you didn’t get all the +s :frowning:

I think there must be some buried in other metadata.json.gz that weren’t used then, but are now.

ah that makes sense. i think i only had to do the overall metadata.json.gz one before, but maybe now we need to do more. sigh.

I can fix this in Hail though

oh that’d be great. the previous fix was pretty nervewracking

The code above now runs, i.e. filtering the columns only, but when I run

mt = mt.filter_rows(hl.agg.any(mt.GT.is_non_ref()))
mt.count()

I am seeing the same global mismatch error.

the log file might help here, then