int32ANDo error

Hi hail team,

I’m trying to test code that generates allele frequencies for 300k exomes. I ran this code successfully in July but am running into this error now:

GETFIELD __C2987__m2981DECODE_r_struct_of_o_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_int32ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_int32ANDo_binaryANDo_ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [bd1ef365284346108d930c9c560e55d2] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/bd1ef365284346108d930c9c560e55d2?project=maclab-ukbb&region=us-central1
gcloud dataproc jobs wait 'bd1ef365284346108d930c9c560e55d2' --region 'us-central1' --project 'maclab-ukbb'
https://console.cloud.google.com/storage/browser/dataproc-1aca38e4-67fe-4b64-b451-258ef1aea4d1-us-central1/google-cloud-dataproc-metainfo/a861c7c7-aa76-44db-9729-d6af23dac8fe/jobs/bd1ef365284346108d930c9c560e55d2/
gs://dataproc-1aca38e4-67fe-4b64-b451-258ef1aea4d1-us-central1/google-cloud-dataproc-metainfo/a861c7c7-aa76-44db-9729-d6af23dac8fe/jobs/bd1ef365284346108d930c9c560e55d2/driveroutput

The code reads in a sparse MatrixTable, densifies, and calls annotate_freq: https://github.com/broadinstitute/gnomad_methods/blob/master/gnomad/utils/annotations.py#L317.

Do you have any suggestions for how to fix this? I will send the log via email.

This is a class too large issue, nothing to do with the number of samples, I think. It looks like there might be some weirdness going on from Hail’s side, but this is something that can certainly be solved by breaking the computations you run into a couple of pieces instead of running all the frequency computations in one go.

1 Like