To give users more control, we’ve changed inclusion of an intercept from implicit to explicit in linear regression, logistic regression, and SKAT. Consider the linear regression model
y = b*x + b0 + b1*c1 + b2*c2 + e
where we are interested the effect size b
on x
per row of a matrix table mt
, and b0
represents an intercept.
BEFORE the intercept was implicitly added so you would write:
hl.linear_regression(y=mt.y, x=mt.x, covariates=[mt.c1, mt.c2]).
NOW the intercept must be included explicitly if desired so you should write:
hl.linear_regression(y=mt.y, x=mt.x, covariates=[1.0, mt.c1, mt.c2]).
Note that 1.0
is just a numeric expression (not special syntax) corresponding to a covariate that is 1.0
for every sample. This is equivalent to the model above, thought of as:
y = b*x + b0*1.0 + b1*c1 + b2*c2 + e
In sum: to get the same behavior as before, just add the covariate 1.0
.
WARNING: The first command will still run but give different results, since it now corresponds to the model without intercept:
y = b*x + b1*c1 + b2*c2 + e.
As another example, now simple linear regression
y = b*x + b_0 + e
corresponds to
hl.linear_regression(y=mt.y, x=mt.x, covariates=[1.0]).
We’ve also removed the empty default value for covariates
, so to do even simpler linear regression (not even an intercept!)
y = b*x + e
explicitly write the empty list []
in
hl.linear_regression(y=mt.y, x=mt.x, covariates=[]).
These changes also makes the regression interface more consistent with the linreg aggregator and LinearMixedModel class.