I am trying to perform linear regression in hail, i.e.:
mt = mt.annotate_cols(
lm = hl.agg.linreg(mt.y, [mt.x1, 1]))
but would like to make a small modification. I have two column fields, a and b, that I would like to incorporate in the following way:
- add one value,a, to the end of y
- add a second predictor variable, x2, that has 0 in all rows except the last one, where it equals b (corresponding to a at the end of y)
- add a 0 to the end of x1
Naively, I would think to achieve this with something like:
add a row field called “dummy” that is all zeros, and run:
mt = mt.annotate_cols(
lm = hl.agg.linreg(mt.y.append(a), [mt.x1.append(0),mt.dummy.append(b), 1]))
However, it appears that I cannot simply append a value within hl.agg.linreg. Another solution would be to add an entry containing a and b in the appropriate fields, but I also cannot find a simple way to add an entry to the MatrixTable.
Is there a simple way to achieve my goal? In case it helps clarify what I’m asking, the rough idea is that x1 corresponds to gene-level statistics that I am regressing on y, and x2 corresponds to a genome-wide statistic that I am trying to fit simultaneously.
I realize the above may be vague–apologies for any unclear parts or omitted key info.