Just met the VCFParseError: ploidy > 2 not supported when we run hail 0.2.40 on a gatk4-mutect2 generated VCF. Existing solution for multi-allelic loci handling is to eliminate the entries, but that does not seem reasonable in somatic calling scenario. Following this thread in 0.1 that Hail is going to support policy > 2, We’re wondering is there any time frame to support ploidy > 2 VCF?
Thank you for your note, Obigbando!
I am not too sure but I can ask one of our engineers on whether this is in development – @tpoterba?
Supporting ploidy > 2 is still on the roadmap, but I don’t expect this to be scheduled within the next few months. It’s a little tricky from a technical perspective, and I think we need some infrastructure changes that we’re working on currently.
Guessing alt allele is part of the entry key, and an entry with multi-allelic key will make it difficult for future matching. If that is the case, we’ve met this issue in our SeqsLab annotation project where we used elasticsearch to serve as variant annotation/interpretation engine. Our approach is to split multi-allelic variant into multiple entries before key encoding. Don’t know if that fit Hail’s situation?