Hey, I have about 3k genomes which I want to jointly call. I like the easy of adding GVCFs (all AS-Haplotypecaller GATK 4.4) to VDS and would like to use it to continuously add new GVCFs as they are about to come in batches.
However, I am struggling with exporting to VCF with a proper info field for AS variant recalibration. There are some (old) posts around also referring to gnomad utils, however as GATK format has changed and this is not working anymore and hard to debug. Is there a method around for the recent version ? Or would it be easier to just use GenomicsDBImport for this ?

This is the structure I have:

mt = hl.vds.to_dense_mt(vds)
Global fields:
Column fields:
    's': str
Row fields:
    'locus': locus<GRCh38>
    'alleles': array<str>
    'rsid': str
Entry fields:
    'LGT': call
    'DP': int32
    'GQ': int32
    'MIN_DP': int32
    'LA': array<int32>
    'LAD': array<int32>
    'LPGT': call
    'LPL': array<int32>
    'RGQ': int32
    'gvcf_info': struct {
        AS_InbreedingCoeff: array<float64>, 
        AS_QD: array<float64>, 
        AS_RAW_BaseQRankSum: str, 
        AS_RAW_MQ: array<float64>, 
        AS_RAW_MQRankSum: array<tuple (
        AS_RAW_ReadPosRankSum: array<tuple (
        AS_SB_TABLE: array<array<int32>>, 
        BaseQRankSum: float64, 
        ExcessHet: float64, 
        InbreedingCoeff: float64, 
        MLEAC: array<int32>, 
        MLEAF: array<float64>, 
        MQRankSum: float64, 
        RAW_MQandDP: array<int32>, 
        ReadPosRankSum: float64
    'PID': str
    'PS': int32
    'SB': array<int32>
Column key: ['s']
Row key: ['locus', 'alleles']

Thanks for any suggestions.

The hail team are very much not experts on this. We don’t maintain anything to annotate a vds with (AS) VQSR annotations.

I’m happy to review code that calculates the annotations needed, or even to come up with an example for you to base more code off of. I just need at least one of the annotations expected and how it is derived.

Or, if your data is amenable to genomics db, that may be easier or simpler for you as they maintain the necessary transformation from gvcf annotations to joint genotyped AS annotations.