Wilcoxon ranksum test

Any plans for a Wilcoxon ranksum test in the expression language?

Can you educate us on use case? Are you thinking of input as two non-empty arrays of numeric values, and output the 2-sided p-value coming from the normal approximation for the null distribution?

Yep, in my case I want to do the test on 2 arrays (of allele frequencies across variants, but I’m happy to coerce the the data into arrays in a KeyTable).

Bumping a 5 year old(!) thread. Are there recent thoughts on an aggregator that might implement a wilcoxon ranksum? Approximate is probably fine using the quantile data within the approximate median framework.

Curious to hear others’ thoughts, but it seems like we could compute an approximate test statistic with a two pass approach that in the first pass computes the approximate quantiles and the N, and in the second pass compute the sum of approximate rank (where rank is computed from the quantiles and N).

That was my first thought as well. But it’s not obvious to me how to handle the averaging of ranks of repeated values (going off of the description on wikipedia). I’d have to think about it for a bit, but it also seems possible you could compute an approximate test statistic directly from the approx cdfs of the two distributions.

I think I’ve convinced myself the second pass isn’t necessary. Next week I can try implementing a function to compute the U statistic given two approx cdfs.

@konradjk What is your timeline for wanting to use this?

No strong urgency, but good to have it on the roadmap!