Hey, I have about 3k genomes which I want to jointly call. I like the easy of adding GVCFs (all AS-Haplotypecaller GATK 4.4) to VDS and would like to use it to continuously add new GVCFs as they are about to come in batches.
However, I am struggling with exporting to VCF with a proper info field for AS variant recalibration. There are some (old) posts around also referring to gnomad utils, however as GATK format has changed and this is not working anymore and hard to debug. Is there a method around for the recent version ? Or would it be easier to just use GenomicsDBImport for this ?
This is the structure I have:
mt = hl.vds.to_dense_mt(vds)
mt.describe()
----------------------------------------
Global fields:
None
----------------------------------------
Column fields:
's': str
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'rsid': str
----------------------------------------
Entry fields:
'LGT': call
'DP': int32
'GQ': int32
'MIN_DP': int32
'LA': array<int32>
'LAD': array<int32>
'LPGT': call
'LPL': array<int32>
'RGQ': int32
'gvcf_info': struct {
AS_InbreedingCoeff: array<float64>,
AS_QD: array<float64>,
AS_RAW_BaseQRankSum: str,
AS_RAW_MQ: array<float64>,
AS_RAW_MQRankSum: array<tuple (
float64,
int32
)>,
AS_RAW_ReadPosRankSum: array<tuple (
float64,
int32
)>,
AS_SB_TABLE: array<array<int32>>,
BaseQRankSum: float64,
ExcessHet: float64,
InbreedingCoeff: float64,
MLEAC: array<int32>,
MLEAF: array<float64>,
MQRankSum: float64,
RAW_MQandDP: array<int32>,
ReadPosRankSum: float64
}
'PID': str
'PS': int32
'SB': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
----------------------------------------
Thanks for any suggestions.