Import VCF with FORMAT 'DP' (coverage )

Hello everyone

Currently i face one problem. I am truing to Import VCF with standard FORMAT set like call_fields=[‘GT’,‘AD’,‘DP’,‘GQ’,‘PL’]

I am doing in in Jupiter notebook, on my local machine. I am trying download 1K genomes… For this i am using following command

vcfs = [f"data/1kg/ALL.chr{contig}.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz" for contig in range(1, 23)]
hl.import_vcf(vcfs,force_bgz=True,call_fields=[‘DP’]).write(‘Haill_mt/’, overwrite=True)

Then it gave me an error:

Hail version: 0.2.97-937922d7f46c
Error summary: HailException: Can only convert a header line with type ‘String’ to a call type. Found ‘Integer’.

Obviously problem occur only with ‘DP’ (coverage ). When i truing other FORMAT settings like this call_fields=[‘AO’,‘RO’,‘GT’,‘AD’,‘GQ’,‘PL’,‘PGT’], the script is running.

vcfs = [f"data/1kg/ALL.chr{contig}.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz" for contig in range(1, 23)]
hl.import_vcf(vcfs,force_bgz=True,call_fields=[‘AO’,‘RO’,‘GT’,‘AD’,‘GQ’,‘PL’,‘PGT’]).write(‘Haill_mt/’, overwrite=True)

Of course, i can calculate DP with equitation DP=AO+RO. Then i can add DP to MatrixTable afterwards

But i wounder, is it possible to add DP to Matrix Table at import_vcf step?

call_fields refers to fields of type “call” (genotype call). GT is the only field of these with a call type.

If you run import_vcf without the call_fields argument and run mt.describe() afterwards, what does that print?

Hi @tpoterba

Here is result for mt.describe()

Global fields:

Column fields:
‘s’: str

Row fields:
‘locus’: locus
‘alleles’: array
‘rsid’: str
‘qual’: float64
‘filters’: set
‘info’: struct {
CIEND: array,
CIPOS: array,
CS: str,
END: int32,
MC: array,
MEINFO: array,
MEND: int32,
MLEN: int32,
MSTART: int32,
SVLEN: array,
SVTYPE: str,
TSD: str,
AC: array,
AF: array,
NS: int32,
AN: int32,
EAS_AF: array,
EUR_AF: array,
AFR_AF: array,
AMR_AF: array,
SAS_AF: array,
DP: int32,
AA: str,
VT: array,
EX_TARGET: bool,

Entry fields:
‘GT’: call

Column key: [‘s’]
Row key: [‘locus’, ‘alleles’]

The only FORMAT field that VCF has is “GT”. It seems like DP doesn’t exist in this VCF.

Hi @tpoterba

It have to be added at previous steps? For example at step BAM to VCF?