Wilcoxon ranksum test

konradjk · September 25, 2017, 5:21pm

Any plans for a Wilcoxon ranksum test in the expression language?

jbloom · September 25, 2017, 6:02pm

Can you educate us on use case? Are you thinking of input as two non-empty arrays of numeric values, and output the 2-sided p-value coming from the normal approximation for the null distribution?

konradjk · September 25, 2017, 6:15pm

Yep, in my case I want to do the test on 2 arrays (of allele frequencies across variants, but I’m happy to coerce the the data into arrays in a KeyTable).

konradjk · August 19, 2022, 12:56pm

Bumping a 5 year old(!) thread. Are there recent thoughts on an aggregator that might implement a wilcoxon ranksum? Approximate is probably fine using the quantile data within the approximate median framework.

tpoterba · August 19, 2022, 1:08pm

Curious to hear others’ thoughts, but it seems like we could compute an approximate test statistic with a two pass approach that in the first pass computes the approximate quantiles and the N, and in the second pass compute the sum of approximate rank (where rank is computed from the quantiles and N).

patrick-schultz · August 19, 2022, 1:21pm

That was my first thought as well. But it’s not obvious to me how to handle the averaging of ranks of repeated values (going off of the description on wikipedia). I’d have to think about it for a bit, but it also seems possible you could compute an approximate test statistic directly from the approx cdfs of the two distributions.

patrick-schultz · August 19, 2022, 1:39pm

I think I’ve convinced myself the second pass isn’t necessary. Next week I can try implementing a function to compute the U statistic given two approx cdfs.

@konradjk What is your timeline for wanting to use this?

konradjk · August 19, 2022, 2:43pm

No strong urgency, but good to have it on the roadmap!

Topic		Replies	Views
[Feature] Approximate quantiles, cdf and pdf plots Updates	0	930	April 3, 2019
Sliding window approach to Spearman's rank Hail Query & hailctl	5	372	October 20, 2021
Issues with sample and variant QC by group Hail Query & hailctl	9	1166	May 14, 2020
ArrayIndexOutOfBoundsException using _cdf_combine Hail Query & hailctl	7	33	December 11, 2024
[Breaking Change] Rename of methods/fields: ctt, chisq, hardy_weinberg, hardy_weinberg_p, variant_qc, transition_disequilibrium_test Updates	4	775	July 31, 2018

Wilcoxon ranksum test

Related topics