There is a robust implementation of HWE Test that takes into account population substructure:

RUTH - Robust Unified Hardy-Weinberg Equilibrium Test. It is used for the TopMed pipeline.

The GitHub repository is here:

It would be great to have an implementation in Hail to run on matrix tables containing a large number of samples with mixed ethnicity…

it looks like all we need is a way to do logistic regression on an arbitrary array/ndarray. Thanks for the submission, this should be feasible to implement in not too long, I think.

Has there been any progress on implementing a version of RUTH in hail?

We’re still building out the ndarray infrastructure that will let us write `hl.logistic_regression_rows`

in Python. I’m not sure exactly the timeline, but I’d hope we have the set of features necessary to implement RUTH in Hail in Python done by the end of 2020.