There is a robust implementation of HWE Test that takes into account population substructure:
RUTH - Robust Unified Hardy-Weinberg Equilibrium Test. It is used for the TopMed pipeline.
The GitHub repository is here:
It would be great to have an implementation in Hail to run on matrix tables containing a large number of samples with mixed ethnicity…
it looks like all we need is a way to do logistic regression on an arbitrary array/ndarray. Thanks for the submission, this should be feasible to implement in not too long, I think.
Has there been any progress on implementing a version of RUTH in hail?
We’re still building out the ndarray infrastructure that will let us write
hl.logistic_regression_rows in Python. I’m not sure exactly the timeline, but I’d hope we have the set of features necessary to implement RUTH in Hail in Python done by the end of 2020.