How do you define call rate per sample vs call rate per variant? Say I have a vds of 100 samples by 10,000 variants.
If sample 1 has genotypes being called on 9,000 out of the 10k variants, does sample.callrate = 0.9?
If variant 1 has being detected in 50 out of the 100 samples, does variant.callrate = 0.5?
Yes, in both cases callRate is the proportion of calls that are non-missing. For variants, it’s measured per row. For samples, it’s measured per column.
Thanks, and I found the minor allele frequency from the variant.qc.AF metrics doesn’t center around 0 (homozygous) and 0.5 (heterozygous) as I expected, any explanation?
homozygosity and heterozygosity are properties of genotype calls, but AF is a property of all the genotypes at one variant.
I understand that. So for a variant being called at multiple samples (a mix of het or hom ref or hom alt), how’s this field defined? Thanks.
The alleles of the genotype call don’t matter. If the genotype is called at all (not ./. in VCF) then it counts as called.
Sorry I wasn’t clear about my question: how’s the va.qc.AF field defined. Say we have a total of 3 samples. 1 sample is 0/0, 1 sample is 0/1, and 1 sample is 1/1. is the AF = 0.5 (3 alt alleles /6 possible alleles) or 2/3 (2 out of the 3 samples genotypes being called? Thanks.