As we intended to use hail to analyse SVs from some in-house WGS and “reproduce” some of the analysis done in the gnomAD-SV paper, we stumbled on some issues with the proper way to handle partially missing genotype calls.
In short, unlike typical SNP/indel variant calling pipeline, SV analysis will create some “1/.” entries for DUPlication events for which it is common to have some uncertainty whether one of the alleles has n copies of the duplication and the other is WT or whether each allele has, respectively,
n-m copies of the duplication.
In another forum post some years back, to circumvent the lack of “partially missing genotype” calls supported by hail, it was proposed to encode those as “haploid ALT” (i.e supply the calls as
1 in lieu of
While this “solves” some data ingress shortcoming, it then turns such calls as
… so my questions are:
- how such DUP calls (where some uncertainty on whether one of the alleles has n copies of the duplication and the other is WT or whether each allele has a varying number of copies of the duplication… an uncertainty which results in a
1/.VCF-representation of such event) been handled in your analysis of DUP in gnomAD-SV?
- If not (because it imposes an
hetinterpretation of the genotype calls), how has it been more elegantly tackled by the gnomAD-SV team?
Thanks in advance