ClassFormatError when writing matrixtable

Hi all—I’m getting a ClassFormatError when I try to write a matrixtable. The full error is below. It seems to be also noted by this post.

I’m just trying to save my results. For now, I can continue to just process the matrixtable from scratch each time I start a cluster.

I’m running Hail (0.2.78) on DNAnexus.

Best,
Jeremy

2022-12-19 18:18:48 Hail: INFO: Ordering unsorted dataset with network shuffle
2022-12-19 18:19:10 Hail: INFO: Ordering unsorted dataset with network shuffle
2022-12-19 18:19:32 Hail: INFO: Ordering unsorted dataset with network shuffle
2022-12-19 18:20:00 Hail: INFO: Ordering unsorted dataset with network shuffle
2022-12-19 18:20:20 Hail: INFO: Ordering unsorted dataset with network shuffle
2022-12-19 18:20:44 Hail: INFO: Ordering unsorted dataset with network shuffle

---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
<ipython-input-32-e9a5a6a89d4e> in <module>
      1 # output prior to running pheWAS
      2 
----> 3 mt_snps.write(f'dnax://{my_database}/jeremy_mt_snps.mt/' class="ansi-blue-fg">, overwrite=True)

<decorator-gen-1275> in write(self, output, overwrite, stage_locally, _codec_spec, _partitions, _checkpoint_file)

/opt/conda/lib/python3.6/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    575     def wrapper(__original_func, *args, **kwargs):
    576         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 577         return __original_func(*args_, **kwargs_)
    578 
    579     return wrapper

/opt/conda/lib/python3.6/site-packages/hail/matrixtable.py in write(self, output, overwrite, stage_locally, _codec_spec, _partitions, _checkpoint_file)
   2542 
   2543         writer = ir.MatrixNativeWriter(output, overwrite, stage_locally, _codec_spec, _partitions, _partitions_type, _checkpoint_file)
-> 2544         Env.backend().execute(ir.MatrixWrite(self._mir, writer))
   2545 
   2546     class _Show:

/opt/conda/lib/python3.6/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
    108                 raise HailUserError(message_and_trace) from None
    109 
--> 110             raise e

/opt/conda/lib/python3.6/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
     84         # print(self._hail_package.expr.ir.Pretty.apply(jir, True, -1))
     85         try:
---> 86             result_tuple = self._jhc.backend().executeEncode(jir, stream_codec)
     87             (result, timings) = (result_tuple._1(), result_tuple._2())
     88             value = ir.typ._from_encoding(result)

/cluster/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

/opt/conda/lib/python3.6/site-packages/hail/backend/py4j_backend.py in deco(*args, **kwargs)
     29                 raise FatalError('%s\n\nJava stack trace:\n%s\n'
     30                                  'Hail version: %s\n'
---> 31                                  'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None
     32         except pyspark.sql.utils.CapturedException as e:
     33             raise FatalError('%s\n\nJava stack trace:\n%s\n'

FatalError: ClassFormatError: Too many arguments in method signature in class file __C60175collect_distributed_array

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 413.0 failed 4 times, most recent failure: Lost task 9.3 in stage 413.0 (TID 1605, ip-10-60-4-132.eu-west-2.compute.internal, executor 2): java.lang.ClassFormatError: Too many arguments in method signature in class file __C60175collect_distributed_array
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:635)
	at is.hail.asm4s.package$HailClassLoader$.liftedTree1$1(package.scala:253)
	at is.hail.asm4s.package$HailClassLoader$.loadOrDefineClass(package.scala:249)
	at is.hail.asm4s.ClassesBytes$$anonfun$load$1.apply(ClassBuilder.scala:65)
	at is.hail.asm4s.ClassesBytes$$anonfun$load$1.apply(ClassBuilder.scala:63)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at is.hail.asm4s.ClassesBytes.load(ClassBuilder.scala:63)
	at is.hail.expr.ir.EmitClassBuilder$$anon$1.apply(EmitClassBuilder.scala:669)
	at is.hail.expr.ir.EmitClassBuilder$$anon$1.apply(EmitClassBuilder.scala:662)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1$$anonfun$apply$1.apply(BackendUtils.scala:31)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1$$anonfun$apply$1.apply(BackendUtils.scala:30)
	at is.hail.utils.package$.using(package.scala:638)
	at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:144)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1.apply(BackendUtils.scala:30)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1.apply(BackendUtils.scala:28)
	at is.hail.backend.spark.SparkBackendComputeRDD.compute(SparkBackend.scala:730)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2001)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1984)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1983)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1983)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1033)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1033)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1033)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2223)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2172)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2161)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:823)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
	at is.hail.backend.spark.SparkBackend.parallelizeAndComputeWithIndex(SparkBackend.scala:286)
	at is.hail.backend.BackendUtils.collectDArray(BackendUtils.scala:28)
	at __C58896Compiled.__m59284split_WriteMetadata_region33_121(Emit.scala)
	at __C58896Compiled.__m59284split_WriteMetadata_region11_125(Emit.scala)
	at __C58896Compiled.__m59284split_WriteMetadata(Emit.scala)
	at __C58896Compiled.__m59097split_Let(Emit.scala)
	at __C58896Compiled.apply(Emit.scala)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$_apply$1.apply$mcV$sp(CompileAndEvaluate.scala:57)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$_apply$1.apply(CompileAndEvaluate.scala:57)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$_apply$1.apply(CompileAndEvaluate.scala:57)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:57)
	at is.hail.expr.ir.CompileAndEvaluate$.evalToIR(CompileAndEvaluate.scala:30)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:30)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.is$hail$expr$ir$LowerOrInterpretNonCompilable$$rewrite$1(LowerOrInterpretNonCompilable.scala:67)
	at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
	at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:16)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:16)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:14)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:14)
	at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:15)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:13)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
	at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:381)
	at is.hail.backend.spark.SparkBackend$$anonfun$8$$anonfun$apply$4.apply(SparkBackend.scala:417)
	at is.hail.backend.spark.SparkBackend$$anonfun$8$$anonfun$apply$4.apply(SparkBackend.scala:414)
	at is.hail.backend.ExecuteContext$$anonfun$scoped$1$$anonfun$apply$1.apply(ExecuteContext.scala:47)
	at is.hail.backend.ExecuteContext$$anonfun$scoped$1$$anonfun$apply$1.apply(ExecuteContext.scala:47)
	at is.hail.utils.package$.using(package.scala:638)
	at is.hail.backend.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:47)
	at is.hail.backend.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:46)
	at is.hail.utils.package$.using(package.scala:638)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
	at is.hail.backend.spark.SparkBackend$$anonfun$8.apply(SparkBackend.scala:414)
	at is.hail.backend.spark.SparkBackend$$anonfun$8.apply(SparkBackend.scala:413)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

java.lang.ClassFormatError: Too many arguments in method signature in class file __C60175collect_distributed_array
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:635)
	at is.hail.asm4s.package$HailClassLoader$.liftedTree1$1(package.scala:253)
	at is.hail.asm4s.package$HailClassLoader$.loadOrDefineClass(package.scala:249)
	at is.hail.asm4s.ClassesBytes$$anonfun$load$1.apply(ClassBuilder.scala:65)
	at is.hail.asm4s.ClassesBytes$$anonfun$load$1.apply(ClassBuilder.scala:63)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at is.hail.asm4s.ClassesBytes.load(ClassBuilder.scala:63)
	at is.hail.expr.ir.EmitClassBuilder$$anon$1.apply(EmitClassBuilder.scala:669)
	at is.hail.expr.ir.EmitClassBuilder$$anon$1.apply(EmitClassBuilder.scala:662)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1$$anonfun$apply$1.apply(BackendUtils.scala:31)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1$$anonfun$apply$1.apply(BackendUtils.scala:30)
	at is.hail.utils.package$.using(package.scala:638)
	at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:144)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1.apply(BackendUtils.scala:30)
	at is.hail.backend.BackendUtils$$anonfun$collectDArray$1.apply(BackendUtils.scala:28)
	at is.hail.backend.spark.SparkBackendComputeRDD.compute(SparkBackend.scala:730)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)




Hail version: 0.2.78-b17627756568
Error summary: ClassFormatError: Too many arguments in method signature in class file __C60175collect_distributed_array

We’ve made some improvements to Hail since 0.2.78 that should hopefully help with this, but in general, this means the Hail compiler is generating too large an expression. Can you share a bit more information about what code you’re executing?

In general, operations that generate a lot of column aggregations are likely to encounter this error.

Also the hail log file would be tremendously helpful in debugging this.

Yes—mt_snps is a matrixtable that contains SNPs for one gene, CASR, on chromosome 3. It is heavily annotated in both rows (e.g. VEP, QC metrics, etc.) and columns (many clinical measurements from UKB). The describe output is below.

I am running this simple write statement:

mt_snps.write(f'dnax://{my_database}/output.mt/', overwrite=True)

Let me try to figure out how to share the log file with you… it’s too large to directly upload.

Best,
Jeremy

UPDATE: I got rid of some of the row annotation statements and was able to write the matrixtable. So, I think it was just that some of the statements were too long…

----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
    'sample_data': struct {
        p30680_i0: float64, 
        p30680_i1: float64, 
        p31: str, 
        p22027: str, 
        p22001: str, 
        p22006: str, 
        p22009_a1: float64, 
        p22009_a2: float64, 
        p22009_a3: float64, 
        p22009_a4: float64, 
        p22009_a5: float64, 
        p22009_a6: float64, 
        p22009_a7: float64, 
        p22009_a8: float64, 
        p22009_a9: float64, 
        p22009_a10: float64, 
        p21003_i0: int64, 
        p21003_i1: int64, 
        p21003_i2: int64, 
        p21003_i3: int64, 
        p30600_i0: float64, 
        p30600_i1: float64
    }
    'Ca_adj_i0': float64
    'Ca_adj_i1': float64
----------------------------------------
Row fields:
    'locus': locus<GRCh38>
    'alleles': array<str>
    'rsid': str
    'qual': float64
    'filters': set<str>
    'info': struct {
        AF: array<float64>, 
        AQ: array<int32>, 
        AC: array<int32>, 
        AN: int32
    }
    'variant_qc': struct {
        dp_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        AC: array<int32>, 
        AF: array<float64>, 
        AN: int32, 
        homozygote_count: array<int32>, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_het: int64, 
        n_non_ref: int64, 
        het_freq_hwe: float64, 
        p_value_hwe: float64
    }
    'a_index': int32
    'was_split': bool
    'AC': array<int32>
    'vep': struct {
        assembly_name: str, 
        allele_string: str, 
        ancestral: str, 
        colocated_variants: array<struct {
            aa_allele: str, 
            aa_maf: float64, 
            afr_allele: str, 
            afr_maf: float64, 
            allele_string: str, 
            amr_allele: str, 
            amr_maf: float64, 
            clin_sig: array<str>, 
            end: int32, 
            eas_allele: str, 
            eas_maf: float64, 
            ea_allele: str, 
            ea_maf: float64, 
            eur_allele: str, 
            eur_maf: float64, 
            exac_adj_allele: str, 
            exac_adj_maf: float64, 
            exac_allele: str, 
            exac_afr_allele: str, 
            exac_afr_maf: float64, 
            exac_amr_allele: str, 
            exac_amr_maf: float64, 
            exac_eas_allele: str, 
            exac_eas_maf: float64, 
            exac_fin_allele: str, 
            exac_fin_maf: float64, 
            exac_maf: float64, 
            exac_nfe_allele: str, 
            exac_nfe_maf: float64, 
            exac_oth_allele: str, 
            exac_oth_maf: float64, 
            exac_sas_allele: str, 
            exac_sas_maf: float64, 
            id: str, 
            minor_allele: str, 
            minor_allele_freq: float64, 
            phenotype_or_disease: int32, 
            pubmed: array<int32>, 
            sas_allele: str, 
            sas_maf: float64, 
            somatic: int32, 
            start: int32, 
            strand: int32
        }>, 
        context: str, 
        end: int32, 
        id: str, 
        input: str, 
        intergenic_consequences: array<struct {
            allele_num: int32, 
            consequence_terms: array<str>, 
            impact: str, 
            minimised: int32, 
            variant_allele: str
        }>, 
        most_severe_consequence: str, 
        motif_feature_consequences: array<struct {
            allele_num: int32, 
            consequence_terms: array<str>, 
            high_inf_pos: str, 
            impact: str, 
            minimised: int32, 
            motif_feature_id: str, 
            motif_name: str, 
            motif_pos: int32, 
            motif_score_change: float64, 
            strand: int32, 
            variant_allele: str
        }>, 
        regulatory_feature_consequences: array<struct {
            allele_num: int32, 
            biotype: str, 
            consequence_terms: array<str>, 
            impact: str, 
            minimised: int32, 
            regulatory_feature_id: str, 
            variant_allele: str
        }>, 
        seq_region_name: str, 
        start: int32, 
        strand: int32, 
        transcript_consequences: array<struct {
            allele_num: int32, 
            amino_acids: str, 
            appris: str, 
            biotype: str, 
            canonical: int32, 
            ccds: str, 
            cdna_start: int32, 
            cdna_end: int32, 
            cds_end: int32, 
            cds_start: int32, 
            codons: str, 
            consequence_terms: array<str>, 
            distance: int32, 
            domains: array<struct {
                db: str, 
                name: str
            }>, 
            exon: str, 
            gene_id: str, 
            gene_pheno: int32, 
            gene_symbol: str, 
            gene_symbol_source: str, 
            hgnc_id: str, 
            hgvsc: str, 
            hgvsp: str, 
            hgvs_offset: int32, 
            impact: str, 
            intron: str, 
            lof: str, 
            lof_flags: str, 
            lof_filter: str, 
            lof_info: str, 
            minimised: int32, 
            polyphen_prediction: str, 
            polyphen_score: float64, 
            protein_end: int32, 
            protein_start: int32, 
            protein_id: str, 
            sift_prediction: str, 
            sift_score: float64, 
            strand: int32, 
            swissprot: str, 
            transcript_id: str, 
            trembl: str, 
            tsl: int32, 
            uniparc: str, 
            variant_allele: str
        }>, 
        variant_class: str
    }
    'vep_proc_id': struct {
        part_idx: int32, 
        block_idx: int32
    }
    'hgvsp_NM_001178065': str
    'hgvsp_NM_000388': str
    'old_locus': locus<GRCh38>
    'old_alleles': array<str>
    'old_to_new': array<int32>
    'new_to_old': array<int32>
    'variant_str': str
    'hethom_median_p30680_i0': float64
    'hethom_quartiles_p30680_i0': array<float64>
    'hethom_stats_p30680_i0': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hethom_median_p30680_i1': float64
    'hethom_quartiles_p30680_i1': array<float64>
    'hethom_stats_p30680_i1': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'het_median_p30680_i0': float64
    'het_quartiles_p30680_i0': array<float64>
    'het_stats_p30680_i0': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'het_median_p30680_i1': float64
    'het_quartiles_p30680_i1': array<float64>
    'het_stats_p30680_i1': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hom_median_p30680_i0': float64
    'hom_quartiles_p30680_i0': array<float64>
    'hom_stats_p30680_i0': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hom_median_p30680_i1': float64
    'hom_quartiles_p30680_i1': array<float64>
    'hom_stats_p30680_i1': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hethom_median_p30680_i0_adj': float64
    'hethom_quartiles_p30680_i0_adj': array<float64>
    'hethom_stats_p30680_i0_adj': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hethom_median_p30680_i1_adj': float64
    'hethom_quartiles_p30680_i1_adj': array<float64>
    'hethom_stats_p30680_i1_adj': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'het_median_p30680_i0_adj': float64
    'het_quartiles_p30680_i0_adj': array<float64>
    'het_stats_p30680_i0_adj': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'het_median_p30680_i1_adj': float64
    'het_quartiles_p30680_i1_adj': array<float64>
    'het_stats_p30680_i1_adj': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hom_median_p30680_i0_adj': float64
    'hom_quartiles_p30680_i0_adj': array<float64>
    'hom_stats_p30680_i0_adj': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'hom_median_p30680_i1_adj': float64
    'hom_quartiles_p30680_i1_adj': array<float64>
    'hom_stats_p30680_i1_adj': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
    'samples_with_variant': struct {
        EID: array<str>, 
        p30680_i0: array<float64>, 
        p30680_i1: array<float64>
    }
    'samples_with_variant_adj': struct {
        EID: array<str>, 
        p30680_i0_adj: array<float64>, 
        p30680_i1_adj: array<float64>
    }
    'linear_regression_p30680_i0': struct {
        n: int32, 
        sum_x: float64, 
        y_transpose_x: float64, 
        beta: float64, 
        standard_error: float64, 
        t_stat: float64, 
        p_value: float64
    }
----------------------------------------
Entry fields:
    'GT': call
    'RNC': array<str>
    'DP': int32
    'AD': array<int32>
    'GQ': int32
    'PL': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
----------------------------------------

Yeah, for better or worse Hail generates code linear in the number of fields.

One trick for dealing with this that works but makes it more annoying to work with is to store these median/quartiles/stats structures as an array:

array<struct {
    'median': float64
    'quartiles': array<float64>
    'stats': struct {
        mean: float64, 
        stdev: float64, 
        min: float64, 
        max: float64, 
        n: int64, 
        sum: float64
    }
}>

And then just have a convention that the first element of the array is hethom_i0, the second is hethom_i1, etc.

Alternatively, a lot of folks store their row annotations in separate Hail Table files and only “join” them in when necessary.