'Internal frequency' in Hail

Hi all,

I cant find a consensus in the literature for the term “internal frequency” in large-scale genomic studies. I’m new in this topic and I would like to be sure that people refer to ‘internal’ allele frequency as the specific frequency for the population/cohort under study, which is different to the allele frequencies annotated from external source/dataset like ExAC, 1000Genomes, etc.

When using ‘variant_qc()’ method in Hail I get AF among the QC metrics. From the source code (https://github.com/hail-is/hail/blob/b226e1f70f338dea953d58c8706ff42fd74f4992/src/main/scala/is/hail/methods/VariantQC.scala) I can see that the AF’s formula is:

AF = (nHet + 2(nHomVar)) / 2(nCalled), where

nCalled = nHomRef + nHet + nHomVar

So, Can this AF be interpreted as ‘internal allele frequency’?


In short, yes. Longer answer: I don’t think “internal frequency” is standardized terminology in the field, but that’s how I would interpret it if I heard it: the allele’s frequency in the dataset being analyzed, as opposed to the allele’s frequency in some (external) reference dataset like gnomAD. Your formulae are correct. (I don’t see “internal frequency” used in our docs, let me know if I’m overlooking it).