Exporting intermediate components of score test

pyspark_user · December 3, 2018, 8:30pm

Our group would like to perform logistic regression of rare variants across 10 or so moderate sized data sets. In order to perform a score test combining summary statistics over these data sets, we would need the components of the score test for each data set: the score function and information matrix.
This is implemented in, for example, the RVTESTS and RAREMETAL software.

I’m not a statistician myself, but it looks like for single variant analysis, that software generated an information matrix which was a 1x1 matrix.

Can you add that feature to Hail? Or is there another way to output those components?

Thanks!

jbloom · December 6, 2018, 12:16pm

The Fisher information matrix and score vector are k x k and k dimensional, resp., where k is the total number of covariates. We indeed compute these for logistic regression, but at the level of Java/Scala, and do not currently return the results because we don’t have a nice Hail matrix type for use in Table/MatrixTable at the level of Python. To fix this, we are adding an ndarray Hail type, which will also allow us to lift all regression code up to Python which would expose all the intermediate values and logic. This will mature over the next quarter, but I’ll discuss with the team whether it makes sense to return an Array[Array[Float64]] in the meantime, since this would solve your problem.

pyspark_user · December 11, 2018, 8:42pm

The idea of the regression code lifted up to Python sounds like it would be beneficial to others as well for writing their own custom regression function. My understanding is that we only need one element of the score vector, and one element of the inverse of the Fisher’s information matrix, similar to how the Wald test returns one number for beta, instead of the full beta vector. I’ve looked at LogisticRegressionModel.scala, and I think we’ll try modifying that to return the two extra values. If we run into questions, would the Hail dev forum be the right place to post them?

Thanks!

jbloom · December 17, 2018, 2:05am

That’s exactly right. So I think on our end we’ll focus on the larger infrastructural change. But in the meantime, we’re happy to answer questions should you try modifying it, and if it’s a useful change for others, go ahead and make a pull request! If you run into questions, post them here or use the Zulip chat!

Topic		Replies	Views
Added Firth logistic regression Updates	1	2936	April 1, 2017
Missing value and logistic regression Hail Query & hailctl	5	790	October 2, 2020
Question about joint analysis in Hail (hl.MatrixTable.union_cols) Hail Query & hailctl	2	526	March 1, 2019
Simple rare variant burden testing with Fisher exact test Help [0.1]	0	1848	May 12, 2017
Announcing Hail 0.2! Updates	2	4900	October 22, 2018

Exporting intermediate components of score test

Related topics