Hi all, I have used hl.de_novo
to call de novo mutations, and I am trying to understand the “CONFIDENCE” field. I understand that the threshold has been defined on the website for the original caller (GitHub - ksamocha/de_novo_scripts: Script used to identify de novo variants from sequencing data. ).
However, checking my confidence field, I made this observation:
For “HIGH” confidence, isn’t p_dn>0.99
a requirement? Can anyone explain this discrepancy, please? Really appreciate any input!
There’s a case in the calling algorithm that results in HIGH confidence calls for high-depth proband calls with p_de_novo > 0.5:
.when((p_de_novo > 0.99) & (kid_ad_ratio > 0.3) & (n_alt_alleles == 1),
hl.struct(p_de_novo=p_de_novo, confidence='HIGH'))
.when((p_de_novo > 0.5) & (kid_ad_ratio > 0.3) & (n_alt_alleles <= 5),
hl.struct(p_de_novo=p_de_novo, confidence='MEDIUM'))
.when(kid_ad_ratio > 0.2,
hl.struct(p_de_novo=p_de_novo, confidence='LOW'))
.or_missing())
.default(hl.case()
.when(((p_de_novo > 0.99) & (kid_ad_ratio > 0.3) & (dp_ratio > 0.2))
| ((p_de_novo > 0.99) & (kid_ad_ratio > 0.3) & (n_alt_alleles == 1))
| ((p_de_novo > 0.5) & (kid_ad_ratio > 0.3) & (n_alt_alleles < 10) & (kid.DP > 10)),
hl.struct(p_de_novo=p_de_novo, confidence='HIGH'))
.when((p_de_novo > 0.5) & ((kid_ad_ratio > 0.3) | (n_alt_alleles == 1)),
hl.struct(p_de_novo=p_de_novo, confidence='MEDIUM'))
.when(kid_ad_ratio > 0.2,
hl.struct(p_de_novo=p_de_novo, confidence='LOW'))
.or_missing()))
return hl.bind(solve, p_de_novo)
def call_hemi(kid_pp, parent, parent_pp, kid_ad_ratio):