Hi everyone, I apologize if this question does not fall in the purview of this forum.
I was going through the gnomAD constraint calculation code ( line 318) and had some questions regarding the calculation of the mutation probabilities.
I do not know the shape of the data inside genome_ht and context_ht as I have never used hail.

My question is: How are the probabilites of a base mutation calculated? I find supplementary documentation vague when they talk about its calculation.

There are 3 ways that I have narrowed down the calculation to, here they are:
A= no. of AAA>ATA mutations in the whole genome.
B= Total no. of AAA context mutations in the whole genome. This is basically AAA>ATA + AAA>ACA + AAA>AGA mutations.
C= No. of times AAA occurs in the whole genome. In the sequence AAAAA, AAA occurs 3 times.

Which of the following is the correct equation for the calculation of probability of AAA>ATA mutation?

  1. A/B
  2. A/C
  3. A/(C*3)
  4. Or am I completely wrong?

Also, whats the logic behind the correction factor to calculate mu from these probabilities?
Thanks a lot

hi, this forum is primarily for support for the hail library, so i think you would likely have more luck reaching out to the gnomad team directly over email at hope that helps!

Thanks, I’ll do that.

@agastya You might also try the brand new gnomAD forum: