Categorical covariates in association analyses

Robert · January 11, 2018, 9:18pm

At the moment you can only include continuous or boolean covariates. It would be great if categorical variables could also be used as covariates. It’s just a matter of turning each unique value into a boolean dummy variable, but it’s a common use case, so it would be nice if you wouldn’t have to do it manually.

tpoterba · January 11, 2018, 10:05pm

This is a great idea. I’m not sure what exactly it should look like, maybe a method that one-hot-encodes a field given and returns a new dataset.

danking · January 16, 2018, 5:45pm

Enumeration types are a good fit for this use case.

jjfarrell · February 20, 2019, 3:30pm

It would be great to have a hail version of this panda function for adding dummy variables to a hail table/matrix table.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Or someway to include these in a regression model automatically. The reference category would need to be specified in that case. So something like this for APOE, dummy(mt.pheno.apoe,ref=‘33’,prefix=“e”) would expand to e22,e23,e24,e33,e34,e44 covariates with e33 omitted in the model as the reference.

Topic		Replies	Views
Association analysis covariate Hail Query & hailctl	1	625	April 13, 2018
Categorical covariates Help [0.1]	1	814	April 21, 2017
How do I include a categorical variable as a covariate in my logistic or linear regression? Hail Query & hailctl	4	1152	November 23, 2020
Linear/Logistic regression covariates Hail Query & hailctl	1	688	August 18, 2020
Programatically define covariates Hail Query & hailctl	2	513	May 3, 2019

Categorical covariates in association analyses

Related topics