In this run with over 5K gvcfs,
hl.experimental.run_combiner(inputs, out_file=output_file, tmp_path=temp_bucket, branch_factor=100, batch_size=100, reference_genome=‘GRCh38’, use_genome_default_intervals=True)
I got an error message :
JsonMappingException: No content to map due to end-of-input
This error occurred in the second round which is to merge 51 MTs to 1 MTs.
Please let me know how to solve this. Thank you.
If you need more information for solving this issue, please let me know. Thank you.
This indicates that something has gone wrong with the metadata files in the matrix tables, unfortunately our current information does not say which file caused the issue.
This error seems to indicate that the metadata file is truncated or empty. I have no idea how this happened.
You may be able to figure out which file was the issue by going through all the intermediate matrix tables that have been written, finding the
metadata.json.gz file in the root of each of the matrix table directories and decompressing them to see which one is empty or invalid.
Thank you so much for your advice. It could be a good starting point for solving this issue.
Fortunately, the run_combiner job with over 5k gvcfs was done successfully after I just increased the boot-dist-size from 200G to 1T as below.
By the way, when I checked
metadata.json.gz followed by your advice, I noticed that some tasks have 0 counts in my job as below. I wonder how to interpret it. Should I worry about 0?
I’m facing the same issue - JsonMappingException: No content to map due to end-of-input. I did check the
metadata.json.gz files. They are not empty, and there is enough memory available for running the processes.
I’m attaching the SparkUI page where I’m tracking the processes’ progress, for reference. Stages 0, 1 and 2 correspond to processing one vcf, and 3, 4 and 5 correspond to another. I don’t know the source of stage 6. MatrixTables (MT) for both the VCFs are getting generated, and I’m able to load them separately to glance through the content. However, the pipeline fails with this error.
at [Source: (is.hail.io.compress.BGzipInputStream); line: 1, column: 0]
(Adding this above line as it isn’t seen in the image)
Any idea what stage of HAIL (importing VCF and writing to MT) is causing this?