Updating info field after allele filter for non-split VDS

Info fields are out of dated once allele specific filters are applied. It will also happen when subsetting samples. In most of occasions, where VDS is split into biallelics, you can directly use variant_qc to update AF, AN and AC info fields. But sometimes we want to want with an non-split VDS, which variant_qc is no longer applicable. We can use filter_alleles, callStats and expression language to solve this problem. Here is a module that I write to accomplish this task, I will see it useful for people with similar needs.

def updateInfo(self):
    '''
    This module update 6 info fields, including AF, AC, AN, DP, MLEAF, MLEAC
    while applying allele specific filters/subsetting samples on a non-splitted VDS.
    '''
    annotation = """va.info.AC = aIndices[1:].map(i => va.info.AC[i-1]),
        va.info.AF = aIndices[1:].map(i => va.info.AF[i-1]),
        va.info.MLEAC = aIndices[1:].map(i => va.info.MLEAC[i-1]),
        va.info.MLEAF = aIndices[1:].map(i => va.info.MLEAF[i-1])"""
    self = (self.annotate_variants_expr('va.callstats = gs.callStats(g => v), va.info.DP = gs.map(g => g.dp).sum()')
        .annotate_variants_expr('va.info.AC = va.callstats.AC[1:], va.info.AF = va.callstats.AF[1:], va.info.AN = va.callstats.AN')
        .filter_alleles('va.callstats.AC[aIndex] == 0', annotation = annotation, keep = False)
    )
    return self

VariantDataset.updateInfo = updateInfo

Note that you want to update the VCF header after you made those changes to make sure that it is compatible with downstream tools (mostly GATK). You can do that by set_va_attributes.

2 Likes