Added ability to compute HWE on subsets of samples

jigold · November 7, 2016, 6:57pm

Hardy Weinberg Equilibrium (HWE) p-values are computed automatically when using the variantqc command. However, the variantqc command computes p-values from the genotypes of all samples present in the dataset.

We have added a HWE aggregator hardyWeinberg() to the Hail Expression Language that allows you to compute HWE p-values on a subset of samples. For example, to compute the HWE p-value in cases and controls separately, we can use the following command:

vds.annotate_variants_expr('va.hweCase = gs.filter(g => sa.pheno == "Case").hardyWeinberg(), va.hweControl = gs.filter(g => sa.pheno == "Control").hardyWeinberg()')

The output schema will have the following format:

va: Struct {
    hweCase: Struct {
        rExpectedHetFrequency: Double,
        pHWE: Double
    },
    hweControl: Struct {
        rExpectedHetFrequency: Double,
        pHWE: Double
    }
}

For more information, see the documentation of pHWE on the Hail website.

Topic		Replies	Views
Filtering samples with extreme heterozygosity in hail? Hail Query & hailctl	7	1231	February 19, 2020
[Breaking Change] Rename of methods/fields: ctt, chisq, hardy_weinberg, hardy_weinberg_p, variant_qc, transition_disequilibrium_test Updates	4	775	July 31, 2018
HWE p-values differ from Plink Help [0.1]	4	1348	March 14, 2017
Shortening MT writing / export_vcf in Variant QC Hail Query & hailctl	0	125	May 23, 2024
Code check to run WES Hail Query & hailctl	2	553	July 8, 2020

Added ability to compute HWE on subsets of samples

Related topics