Hi everyone, I apologize if this question does not fall in the purview of this forum.

I was going through the gnomAD constraint calculation code (https://github.com/broadinstitute/gnomad_lof/blob/master/constraint_utils/constraint_basics.py line 318) and had some questions regarding the calculation of the mutation probabilities.

I do not know the shape of the data inside genome_ht and context_ht as I have never used hail.

My question is: How are the probabilites of a base mutation calculated? I find supplementary documentation vague when they talk about its calculation.

There are 3 ways that I have narrowed down the calculation to, here they are:

A= no. of AAA>ATA mutations in the whole genome.

B= Total no. of AAA context mutations in the whole genome. This is basically AAA>ATA + AAA>ACA + AAA>AGA mutations.

C= No. of times AAA occurs in the whole genome. In the sequence AAAAA, AAA occurs 3 times.

Which of the following is the correct equation for the calculation of probability of AAA>ATA mutation?

- A/B
- A/C
- A/(C*3)
- Or am I completely wrong?

Also, whats the logic behind the correction factor to calculate mu from these probabilities?

Thanks a lot