Error: "No content to map due to end-of-input"

jinasong · September 25, 2020, 8:37pm

Hello,

In this run with over 5K gvcfs,

hl.experimental.run_combiner(inputs, out_file=output_file, tmp_path=temp_bucket, branch_factor=100, batch_size=100, reference_genome=‘GRCh38’, use_genome_default_intervals=True)

I got an error message :

JsonMappingException: No content to map due to end-of-input

This error occurred in the second round which is to merge 51 MTs to 1 MTs.

Please let me know how to solve this. Thank you.
-Jina

jinasong · September 28, 2020, 5:47pm

Hello,

If you need more information for solving this issue, please let me know. Thank you.

-Jina

chrisvittal · September 28, 2020, 6:07pm

Jina,

This indicates that something has gone wrong with the metadata files in the matrix tables, unfortunately our current information does not say which file caused the issue.

This error seems to indicate that the metadata file is truncated or empty. I have no idea how this happened.

You may be able to figure out which file was the issue by going through all the intermediate matrix tables that have been written, finding the metadata.json.gz file in the root of each of the matrix table directories and decompressing them to see which one is empty or invalid.

Best,
Chris

jinasong · September 28, 2020, 6:34pm

Hi @chrisvittal,

Thank you so much for your advice. It could be a good starting point for solving this issue.

-Jina

jinasong · September 28, 2020, 11:31pm

Hi @chrisvittal

Fortunately, the run_combiner job with over 5k gvcfs was done successfully after I just increased the boot-dist-size from 200G to 1T as below.

–worker-boot-disk-size=1000 --preemptible-worker-boot-disk-size=1000

By the way, when I checked metadata.json.gz followed by your advice, I noticed that some tasks have 0 counts in my job as below. I wonder how to interpret it. Should I worry about 0?

Thank you.

-Jina

aneesha_d · January 4, 2021, 5:06am

Hi,

I’m facing the same issue - JsonMappingException: No content to map due to end-of-input. I did check the metadata.json.gz files. They are not empty, and there is enough memory available for running the processes.

I’m attaching the SparkUI page where I’m tracking the processes’ progress, for reference. Stages 0, 1 and 2 correspond to processing one vcf, and 3, 4 and 5 correspond to another. I don’t know the source of stage 6. MatrixTables (MT) for both the VCFs are getting generated, and I’m able to load them separately to glance through the content. However, the pipeline fails with this error.

at [Source: (is.hail.io.compress.BGzipInputStream); line: 1, column: 0]
(Adding this above line as it isn’t seen in the image)

Any idea what stage of HAIL (importing VCF and writing to MT) is causing this?

Thanks!
Aneesha

Topic		Replies	Views
Possible vcf_combiner issue Hail Query & hailctl	19	1242	June 15, 2020
ArrayIndexOutOfBoundsException with run_combiner Hail Query & hailctl	5	479	May 3, 2021
Error in calling vcf_combiner Hail Query & hailctl	14	659	July 28, 2021
Fail to retrieve row information of Hail matrix.table Hail Query & hailctl	5	523	July 22, 2022
Merge multiple sparse MT to one sparse MT Hail Query & hailctl	5	400	September 21, 2020

Error: "No content to map due to end-of-input"

Related topics