Exporting intermediate components of score test


#1

Our group would like to perform logistic regression of rare variants across 10 or so moderate sized data sets. In order to perform a score test combining summary statistics over these data sets, we would need the components of the score test for each data set: the score function and information matrix.
This is implemented in, for example, the RVTESTS and RAREMETAL software.

I’m not a statistician myself, but it looks like for single variant analysis, that software generated an information matrix which was a 1x1 matrix.

Can you add that feature to Hail? Or is there another way to output those components?

Thanks!


#2

The Fisher information matrix and score vector are k x k and k dimensional, resp., where k is the total number of covariates. We indeed compute these for logistic regression, but at the level of Java/Scala, and do not currently return the results because we don’t have a nice Hail matrix type for use in Table/MatrixTable at the level of Python. To fix this, we are adding an ndarray Hail type, which will also allow us to lift all regression code up to Python which would expose all the intermediate values and logic. This will mature over the next quarter, but I’ll discuss with the team whether it makes sense to return an Array[Array[Float64]] in the meantime, since this would solve your problem.