Missing "END" field from parsed .g.vcf files

Hi folks,

I have loaded about 7000 .g.vcf files into a VDS using the hl.vds.new_combiner() command.

I’m hoping to follow much of the gnomAD process to create a public dataset from these samples, but the first command from the gnomAD docs, create_last_END_positions.py is trying to run mt.select_entries("END") while there is no END field available in my VDS. I do have an END column in my starting .g.vcf files, but I don’t see it when I run describe() on my large VDS:

VDS.variant_data.describe()                                                                                                                                                                                                                                          
----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
----------------------------------------
Row fields:
    'locus': locus<GRCh38>
    'alleles': array<str>
    'rsid': str
----------------------------------------
Entry fields:
    'LA': array<int32>
    'LGT': call
    'LAD': array<int32>
    'LPGT': call
    'LPL': array<int32>
    'RGQ': int32
    'gvcf_info': struct {
        BaseQRankSum: float64, 
        ExcessHet: float64, 
        InbreedingCoeff: float64, 
        MLEAC: array<int32>, 
        MLEAF: array<float64>, 
        MQRankSum: float64, 
        RAW_MQandDP: array<int32>, 
        ReadPosRankSum: float64
    }
    'DP': int32
    'GP': array<float64>
    'GQ': int32
    'MIN_DP': int32
    'PG': array<float64>
    'PID': str
    'PS': int32
    'SB': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
----------------------------------------

Maybe END is there, but I’m not accessing it properly? Alternatively, the gnomAD scripts suggest that this will help in downstream steps, but maybe it is unnecessary?

thanks for any help!

try vds.reference_data.describe()

The VDS splits reference and variant data into two separate matrixtables, which is both for efficiency (this is better for both storage and compute) and interface reasons.