Gcloud job failure

Hi Hail Team!

I recently ran the following script using hailctl dataproc. It took 19 hours to run and all of the steps seemed to be completed, but at the end I received an error message that was not very informative and nothing was actually written to the file.

Is there something wrong with the code? Are there other potential sources of error?

ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [d7070dc3db81471a8be2582d82ecc92e] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:

gcloud dataproc jobs wait ‘d7070dc3db81471a8be2582d82ecc92e’ --region ‘us-east1’ --project ‘brave-cubist-363320’

gs://dataproc-staging-us-east1-942231253036-bw4veo0a/google-cloud-dataproc-metainfo/9e941d43-e4d7-43c5-a07a-3084830814df/jobs/d7070dc3db81471a8be2582d82ecc92e/driveroutput
Traceback (most recent call last):
File “/Users/xyz123/Library/Python/3.7/bin/hailctl”, line 8, in
sys.exit(main())
File “/Users/xyz123/Library/Python/3.7/lib/python/site-packages/hailtop/hailctl/main.py”, line 107, in main
cli.main(args)
File “/Users/xyz123/Library/Python/3.7/lib/python/site-packages/hailtop/hailctl/dataproc/cli.py”, line 124, in main
jmp[args.module].main(args, pass_through_args))
File “/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py”, line 584, in run_until_complete
return future.result()
File “/Users/xyz123/Library/Python/3.7/lib/python/site-packages/hailtop/hailctl/dataproc/submit.py”, line 88, in main
gcloud.run(cmd)
File “/Users/xyz123/Library/Python/3.7/lib/python/site-packages/hailtop/hailctl/dataproc/gcloud.py”, line 9, in run
return subprocess.check_call([“gcloud”] + command)
File “/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/subprocess.py”, line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘gcloud’, ‘dataproc’, ‘jobs’, ‘submit’, ‘pyspark’, ‘/Users/xyz123/Desktop/AUTUMN/AMP-PD_Code/densify_filter.py’, ‘–cluster=cluster1’, ‘–files=’, ‘–py-files=/var/folders/qg/q8g70wfd4493bfmkmjgcb3qh0000gn/T/pyscripts_xi5m2d68.zip’, ‘–properties=’, ‘–region=us-east1’, ‘–’, ‘-i’, ‘gs://filepath/v1-2-amp-pd-mgrb.sparse.mt/’, ‘-o’, ‘gs://file_path/test_start.mt/’]’ returned non-zero exit status 1.

import hail as hl

import argparse

Arguements

parser = argparse.ArgumentParser()

parser.add_argument(“-f”, “–full_run”, action=“store_true”, help=“Runs on chr22 and chrX only by default. If full_run is set, it runs on the whole matrix. WARNING: This will be VERY expensive”)

parser.add_argument(“-w”, “–overwrite”, action=‘store_true’, help=“If set will overwrite output matrix if it already exists”)

requiredNamed = parser.add_argument_group(‘required named arguments’)

requiredNamed.add_argument(“-i”, “–input_mt_path”, required=True)

requiredNamed.add_argument(“-o”, “–output_mt_path”, required=True)

#requiredNamed.add_argument(“-p”, “–requester_pays_project_id”, help=“Project ID to bill to when accessing requester pays bucket, needed to access hail annotationDB”)

args = parser.parse_args()

Store Inputs

input_mt_path = args.input_mt_path

output_mt_path = args.output_mt_path

#requester_pays_project_id = args.requester_pays_project_id

read mt

mt = hl.read_matrix_table(input_mt_path)

mt = hl.experimental.densify(mt)

Save mt densified and filtered to CHR22/PPMI

mt.write(output_mt_path, overwrite=True)