Hi,
As we intended to use hail to analyse SVs from some in-house WGS and “reproduce” some of the analysis done in the gnomAD-SV paper, we stumbled on some issues with the proper way to handle partially missing genotype calls.
In short, unlike typical SNP/indel variant calling pipeline, SV analysis will create some “1/.” entries for DUPlication events for which it is common to have some uncertainty whether one of the alleles has n copies of the duplication and the other is WT or whether each allele has, respectively, m
and n-m
copies of the duplication.
In another forum post some years back, to circumvent the lack of “partially missing genotype” calls supported by hail, it was proposed to encode those as “haploid ALT” (i.e supply the calls as 1
in lieu of 1/.
or ./1
)
While this “solves” some data ingress shortcoming, it then turns such calls as hom_var
… so my questions are:
- how such DUP calls (where some uncertainty on whether one of the alleles has n copies of the duplication and the other is WT or whether each allele has a varying number of copies of the duplication… an uncertainty which results in a
1/.
VCF-representation of such event) been handled in your analysis of DUP in gnomAD-SV? - If not (because it imposes an
hom_var
over possiblehet
interpretation of the genotype calls), how has it been more elegantly tackled by the gnomAD-SV team?
Thanks in advance