Info fields are out of dated once allele specific filters are applied. It will also happen when subsetting samples. In most of occasions, where VDS is split into biallelics, you can directly use
variant_qc to update AF, AN and AC info fields. But sometimes we want to want with an non-split VDS, which
variant_qc is no longer applicable. We can use
callStats and expression language to solve this problem. Here is a module that I write to accomplish this task, I will see it useful for people with similar needs.
def updateInfo(self): ''' This module update 6 info fields, including AF, AC, AN, DP, MLEAF, MLEAC while applying allele specific filters/subsetting samples on a non-splitted VDS. ''' annotation = """va.info.AC = aIndices[1:].map(i => va.info.AC[i-1]), va.info.AF = aIndices[1:].map(i => va.info.AF[i-1]), va.info.MLEAC = aIndices[1:].map(i => va.info.MLEAC[i-1]), va.info.MLEAF = aIndices[1:].map(i => va.info.MLEAF[i-1])""" self = (self.annotate_variants_expr('va.callstats = gs.callStats(g => v), va.info.DP = gs.map(g => g.dp).sum()') .annotate_variants_expr('va.info.AC = va.callstats.AC[1:], va.info.AF = va.callstats.AF[1:], va.info.AN = va.callstats.AN') .filter_alleles('va.callstats.AC[aIndex] == 0', annotation = annotation, keep = False) ) return self VariantDataset.updateInfo = updateInfo
Note that you want to update the VCF header after you made those changes to make sure that it is compatible with downstream tools (mostly GATK). You can do that by