Categorical covariates in association analyses


#1

At the moment you can only include continuous or boolean covariates. It would be great if categorical variables could also be used as covariates. It’s just a matter of turning each unique value into a boolean dummy variable, but it’s a common use case, so it would be nice if you wouldn’t have to do it manually.


#2

This is a great idea. I’m not sure what exactly it should look like, maybe a method that one-hot-encodes a field given and returns a new dataset.


#3

Enumeration types are a good fit for this use case.


#4

It would be great to have a hail version of this panda function for adding dummy variables to a hail table/matrix table.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Or someway to include these in a regression model automatically. The reference category would need to be specified in that case. So something like this for APOE, dummy(mt.pheno.apoe,ref=‘33’,prefix=“e”) would expand to e22,e23,e24,e33,e34,e44 covariates with e33 omitted in the model as the reference.