VCFParseError on write MatrixTable


I am laoding a multisaple VCF as:

dta = hl.import_vcf(filename, force_bgz = True, array_elements_required = True, min_partitions = 2)
mt = dta.key_rows_by('locus') \
    .distinct_by_row() \
    .key_rows_by('locus', 'alleles')

I have multiple variants annotated at the same loci, so I am spliting the file with:

x1 = hl.split_multi_hts(mt)

Then I am creating some annotations with transmute_entries and annotate_rows and I want to save the file.


x1.write('~/annotated', overwrite = True)
FatalError: VCFParseError: missing value in FORMAT array. Import with argument 'array_elements_required=False'
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 53.0 failed 1 times, most recent failure: Lost task 3.0 in stage 53.0 (TID 559, localhost, executor driver): is.hail.utils.HailException: 12174: missing value in FORMAT array. Import with argument 'array_elements_required=False'
... 3:.:.:0,73,892,73,892,892 0/1:12,14,.:26:99:0|1:3789194_G_GCGGCCC:638,0, ...
offending line: 1	3789195	.	T	TGTCCCTGCCGC,C	289664	.	BaseQRankSum=0.374;Cli...
Hail version: 0.2.22-597b3bd86135
Error summary: VCFParseError: missing value in FORMAT array. Import with argument 'array_elements_required=False'

Any help understanding what is happening or solving the issue is welcome.

By default, Hail expects that there are no missing values inside of FORMAT array fields like AD and PL. This is to enable optimizations that save on space and compute.

You do have missingness here, so import the vcf with the flag array_elements_required=False as the error suggests.

Just because I’d like to understand better about array_elements_required = False. When set, what would happened specifically to the row that had failed above?
Would it still be parsed? What would happen to the specific offending ,.: field?

Thanks, Alan

using the default array_elements_required=True, we don’t need to keep track of missingness of array elements, which leads to smaller files and faster processing. The only parse error that should be thrown is the one above, that explicitly calls out a missing value in an array that is assumed to be always-present.