Categorical covariates in association analyses

At the moment you can only include continuous or boolean covariates. It would be great if categorical variables could also be used as covariates. It’s just a matter of turning each unique value into a boolean dummy variable, but it’s a common use case, so it would be nice if you wouldn’t have to do it manually.

This is a great idea. I’m not sure what exactly it should look like, maybe a method that one-hot-encodes a field given and returns a new dataset.

Enumeration types are a good fit for this use case.

It would be great to have a hail version of this panda function for adding dummy variables to a hail table/matrix table.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Or someway to include these in a regression model automatically. The reference category would need to be specified in that case. So something like this for APOE, dummy(mt.pheno.apoe,ref=‘33’,prefix=“e”) would expand to e22,e23,e24,e33,e34,e44 covariates with e33 omitted in the model as the reference.