I’m analysing multisample VCFs generated by Dragen’s iterative genotyper in Hail. Analogous to Hail’s VDS format, the VCF contains local alelles PLs (LPL) instead of the standard global PLs output by other methods. I can therefore use the vds.local_to_global function to impute dummy global PL values.
My query pertains to the the order of the values in the LPL entries. While this question may be best answered by the Dragen devs, I thought someone on the HAIL team may also understand how LPLs are generated well enough to help me out.
So as I understand it, the 0 value in the LPL (and PL) entry always corresponds to the called genotype. There should therefore be no cases in which, for example, a homozygous reference genotype has a 0 LPL value in any position except the first, regardless of which/how many alt alleles are are also reported.
However in my msVCF there are many entries where this isn’t the case. E.g.:
chr1:10177 [“A”,“AC”,“C”,“<NON_REF>”] 0/0 [20,0,32,132,132,132] [“2”,“3”]
So this is homozygous ref for which alt allele 2 (‘C’) and the NON_REF are also reported. The 0 is in the second position in the LPL, which according to the allele order in the last field (the local alt alleles), should correspond to genotype 0/2. So, is this LPL field incorrect, or am I wrong about the 0 LPL always corresponding to the called genotype? Or could I be misinterpreting the order of the LPL values here?