Our group would like to perform logistic regression of rare variants across 10 or so moderate sized data sets. In order to perform a score test combining summary statistics over these data sets, we would need the components of the score test for each data set: the score function and information matrix.
This is implemented in, for example, the RVTESTS and RAREMETAL software.
I’m not a statistician myself, but it looks like for single variant analysis, that software generated an information matrix which was a 1x1 matrix.
Can you add that feature to Hail? Or is there another way to output those components?
The Fisher information matrix and score vector are
k x k and
k dimensional, resp., where
k is the total number of covariates. We indeed compute these for logistic regression, but at the level of Java/Scala, and do not currently return the results because we don’t have a nice Hail matrix type for use in Table/MatrixTable at the level of Python. To fix this, we are adding an ndarray Hail type, which will also allow us to lift all regression code up to Python which would expose all the intermediate values and logic. This will mature over the next quarter, but I’ll discuss with the team whether it makes sense to return an Array[Array[Float64]] in the meantime, since this would solve your problem.
The idea of the regression code lifted up to Python sounds like it would be beneficial to others as well for writing their own custom regression function. My understanding is that we only need one element of the score vector, and one element of the inverse of the Fisher’s information matrix, similar to how the Wald test returns one number for beta, instead of the full beta vector. I’ve looked at LogisticRegressionModel.scala, and I think we’ll try modifying that to return the two extra values. If we run into questions, would the Hail dev forum be the right place to post them?
That’s exactly right. So I think on our end we’ll focus on the larger infrastructural change. But in the meantime, we’re happy to answer questions should you try modifying it, and if it’s a useful change for others, go ahead and make a pull request! If you run into questions, post them here or use the Zulip chat!