Hi folks,
I have loaded about 7000 .g.vcf files into a VDS using the hl.vds.new_combiner() command.
I’m hoping to follow much of the gnomAD process to create a public dataset from these samples, but the first command from the gnomAD docs, create_last_END_positions.py
is trying to run mt.select_entries("END")
while there is no END
field available in my VDS. I do have an END
column in my starting .g.vcf files, but I don’t see it when I run describe()
on my large VDS:
VDS.variant_data.describe()
----------------------------------------
Global fields:
None
----------------------------------------
Column fields:
's': str
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'rsid': str
----------------------------------------
Entry fields:
'LA': array<int32>
'LGT': call
'LAD': array<int32>
'LPGT': call
'LPL': array<int32>
'RGQ': int32
'gvcf_info': struct {
BaseQRankSum: float64,
ExcessHet: float64,
InbreedingCoeff: float64,
MLEAC: array<int32>,
MLEAF: array<float64>,
MQRankSum: float64,
RAW_MQandDP: array<int32>,
ReadPosRankSum: float64
}
'DP': int32
'GP': array<float64>
'GQ': int32
'MIN_DP': int32
'PG': array<float64>
'PID': str
'PS': int32
'SB': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
----------------------------------------
Maybe END
is there, but I’m not accessing it properly? Alternatively, the gnomAD scripts suggest that this will help in downstream steps, but maybe it is unnecessary?
thanks for any help!