While running Hail (version 0.2.47-d9e1f3a110c8) in Google (Spark version 2.4.5) via dataproc, the method hl.experimental.run_combiner() failed with
File “/tmp/fa5ffc019a184f189e94e356dd38d7b5/combiner.py”, line 28, in
reference_genome=‘GRCh38’)
File “/opt/conda/default/lib/python3.6/site-packages/hail/experimental/vcf_combiner/vcf_combiner.py”, line 596, in run_combiner
hl.experimental.write_matrix_tables(merge_mts, tmp, overwrite=True)
File “”, line 2, in write_matrix_tables
File “/opt/conda/default/lib/python3.6/site-packages/hail/typecheck/check.py”, line 614, in wrapper
return original_func(*args, **kwargs)
File “/opt/conda/default/lib/python3.6/site-packages/hail/experimental/write_multiple.py”, line 17, in write_matrix_tables
Env.backend().execute(MatrixMultiWrite([mt._mir for mt in mts], writer))
File “/opt/conda/default/lib/python3.6/site-packages/hail/backend/spark_backend.py”, line 296, in execute
result = json.loads(self._jhc.backend().executeJSON(jir))
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/opt/conda/default/lib/python3.6/site-packages/hail/backend/spark_backend.py”, line 41, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: AssertionError: assertion failed
Ack, this is a tabix index issue. I think we need to isolate the file that’s causing the problem. I’ll create a development build you can run that will print the file path in the error.
Is there maybe some python hail code I could run on each vcf+index one-by-one in a loop to make sure they are in the correct format and not corrupted?
By the way, this time it ran 45 minutes longer than the previous tries (6h30min vs 5h45 for the last 3 tries), so that could actually suggest that the reindex/reupload fixed the one vcf it was bugging on, and it simply found a second corrupted vcf/index later on?
If fixed a second vcf/index and the combiner completed successfully!
I ran it on the whole genome as I was not able to subset to chr22; I got hail.utils.java.FatalError: HailException: range bounds must be inclusive when running:
# run chr22 only for faster test:
contig='chr22'
chr22_interval = [hl.Interval(
start=hl.Locus(contig=contig, position=1, reference_genome='GRCh38'),
end=hl.Locus.parse(f'{contig}:END', reference_genome='GRCh38'))]
print("run combiner chr22")
hl.experimental.run_combiner(inputs,
intervals=chr22_interval,
out_file=output_file,
tmp_path=temp_bucket,
reference_genome='GRCh38')