HWE p-values differ from Plink

rmitchell · March 12, 2017, 7:49pm

Hello,

I’ve been comparing statistics gathered from plink and I noticed that many variants did not match up. I can attribute some of this to bilallelic vs. >2 alt alleles. But when I look at only biallelic sites, there are many sites that plink has a pHWE = 1, while the pHWE calculated using Hail is much different (and varies significantly). Any idea why this is happening?

Thanks!

jbloom · March 12, 2017, 8:16pm

Hi Rebecca, thanks for writing! Could you help us reproduce this on our end? In particular, what are the Plink and Hail scripts you are running? And is there an example on public or synthetic data showing the discrepancies that we can investigate?

pHWE = 1 usually indicates an edge case involving zeroes, so it’s certainly possible we handle these differently. We fixed an issue with the Fisher Exact Test last week whereby we had forgotten to check whether either column in the 2 x 2 array was all zero (we were only checking the rows). In that case, Hail now just returns 1.

rmitchell · March 13, 2017, 1:38pm

Here’s the hail script I used (sorry, it wouldn’t let me attach it as a file):

vds=hc.import_vcf('/work-zfs/darking1/active/projects/BAZ2B/SureSelect/Sequencing/variant.filtering/targetedregions/final.recode.vcf')
vds = vds.split_multi()
vds = vds.annotate_samples_table('/work-zfs/darking1/active/projects/BAZ2B/SureSelect/Sequencing/variant.filtering/targetedregions/allsamples.pheno.updated.txt',
                                 root='sa.pheno',
                                 sample_expr='Sample',
                                 config=TextTableConfig(impute=True))
vds.count()
vds.count(genotypes=True)
vds_vCR = (vds.sample_qc())
vds_vCR.export_samples('filteredsampleqc.txt', 'Sample = s.id, sa.qc.*')
vds_vCR = (vds.sample_qc())
vds_gAB_sCR_sGQ = (vds
    .annotate_samples_vds(vds_vCR, code = 'sa.qc = vds.qc' )
    .filter_samples_expr('sa.qc.gqMean >= 20'))

##variant QC

vds_gAB_sCR_sGQ = vds_gAB_sCR_sGQ.variant_qc()
variantqc_table = vds_gAB_sCR_sGQ.variants_keytable().to_pandas()
vds_gAD_sCR_sGQ_vGQ= vds_gAB_sCR_sGQ.filter_variants_expr('va.qc.gqMean >= 20')	
vds_gAD_sCR_sGQ_vGQ.export_variants('filteredvariants.2.txt', 'Variant=va.rsid, Chr=v.contig, Start=v.start, Ref=v.ref, Alt.alleles=v.altAlleles, N.alleles=v.nAltAlleles(), va.qc.*')

Plink was simply:

plink --vcf ../ALL.pos.recode.vcf --keep hailsamples.txt --hardy --make-bed --out all.hail

I used the same samples and same variants for both programs. I also didn’t filter genotypes based on allelic balance, in order to be as consistent as possible between the two programs.

I’m attaching one plot, one showing the comparison of pHWE from Hail vs. pHWE from plink restricted to only biallelic variants. (I also have the same plot for all variants, but the forum will only allow me to upload one plot).

I can also send you the summary files from hail and plink if that would help at all. I’m doing this on ~1600 samples.

jbloom · March 14, 2017, 8:15pm

That’s a great plot! Based on it, I’m confident the difference is due to how Plink and Hail calculate pHWE.

Our documentation of the pHWE function includes this write up describing the math of our implementation with references. But we do not make clear that we indeed use mid-p correction on the exact test, which Plink does not. I’ll update the docs accordingly.

With this correction, the right, left, and exact tests sum half the probability of the observed outcome and the probabilities of “more extreme” outcomes (greater value, lesser value, or smaller probability, respectively). In the exact test, outcomes with probability equal to the observed outcome are also weighted by 1/2. We found this to be a more sensible procedure for p-values on this discrete null distribution.

While the difference between 1 and .5 is large, for small p-values the results should be very similar. This convergence in apparent in the plot above, and you could investigate further with a log-log plot zooming in near (0,0) above. If you make that plot (and there are actually points down there), it’d be awesome if you could post it here as well.

rmitchell · March 14, 2017, 11:01pm

Yup. I realized today that plink has a ‘midp’ modifier that specifies a mid-p adjustment. When I did that (and removed any variant with >2 alleles), the p-values were perfected correlated. See attached plot.

Thanks for your help! Looking forward to using Hail.

Topic		Replies	Views
Added ability to compute HWE on subsets of samples Updates	0	987	November 7, 2016
Reproduce hl.pc_project output using Plink Development	0	11	August 8, 2025
Miss PL score on hom-ref genotype after run_combiner step Hail Query & hailctl	12	572	April 27, 2023
Filtering samples with extreme heterozygosity in hail? Hail Query & hailctl	7	1234	February 19, 2020
[Breaking Change] Rename of methods/fields: ctt, chisq, hardy_weinberg, hardy_weinberg_p, variant_qc, transition_disequilibrium_test Updates	4	776	July 31, 2018

HWE p-values differ from Plink

Related topics