Hardy Weinberg Equilibrium (HWE) p-values are computed automatically when using the variantqc
command. However, the variantqc
command computes p-values from the genotypes of all samples present in the dataset.
We have added a HWE aggregator hardyWeinberg()
to the Hail Expression Language that allows you to compute HWE p-values on a subset of samples. For example, to compute the HWE p-value in cases and controls separately, we can use the following command:
vds.annotate_variants_expr('va.hweCase = gs.filter(g => sa.pheno == "Case").hardyWeinberg(), va.hweControl = gs.filter(g => sa.pheno == "Control").hardyWeinberg()')
The output schema will have the following format:
va: Struct {
hweCase: Struct {
rExpectedHetFrequency: Double,
pHWE: Double
},
hweControl: Struct {
rExpectedHetFrequency: Double,
pHWE: Double
}
}
For more information, see the documentation of pHWE
on the Hail website.