Info fields are out of dated once allele specific filters are applied. It will also happen when subsetting samples. In most of occasions, where VDS is split into biallelics, you can directly use variant_qc
to update AF, AN and AC info fields. But sometimes we want to want with an non-split VDS, which variant_qc
is no longer applicable. We can use filter_alleles
, callStats
and expression language to solve this problem. Here is a module that I write to accomplish this task, I will see it useful for people with similar needs.
def updateInfo(self):
'''
This module update 6 info fields, including AF, AC, AN, DP, MLEAF, MLEAC
while applying allele specific filters/subsetting samples on a non-splitted VDS.
'''
annotation = """va.info.AC = aIndices[1:].map(i => va.info.AC[i-1]),
va.info.AF = aIndices[1:].map(i => va.info.AF[i-1]),
va.info.MLEAC = aIndices[1:].map(i => va.info.MLEAC[i-1]),
va.info.MLEAF = aIndices[1:].map(i => va.info.MLEAF[i-1])"""
self = (self.annotate_variants_expr('va.callstats = gs.callStats(g => v), va.info.DP = gs.map(g => g.dp).sum()')
.annotate_variants_expr('va.info.AC = va.callstats.AC[1:], va.info.AF = va.callstats.AF[1:], va.info.AN = va.callstats.AN')
.filter_alleles('va.callstats.AC[aIndex] == 0', annotation = annotation, keep = False)
)
return self
VariantDataset.updateInfo = updateInfo
Note that you want to update the VCF header after you made those changes to make sure that it is compatible with downstream tools (mostly GATK). You can do that by set_va_attributes
.