Association analysis covariate

Hi, I have got covariate such as RACE that’s basically string/categorical, I know hail currently doesn’t support this type, so what’s the syntax to convert that column in my samples_table from string to numeric? Thanks.

Suppose the matrix table mt has a column field RACE of type String that can take three values: "EUR", "ASN", "AFR". Since Hail internally adds an intercept covariate (which takes the value 1 for every sample), you’ll only want to add two covariates to account for RACE. Were you to add three covariates, the design matrix would be singular.

The simplest approach is a dummy encoding that encodes each category as a Boolean field. You could add these column fields to mt with annotate_cols if you’ll reuse them elsewhere in your analyses. But if you only want to use them in a regression, you might as well just create them on the fly. In 0.2 syntax, this looks like:

mt = hl.linear_regression(y = ..., x = ..., covariates = [mt.RACE == 'EUR', mt.RACE == 'ASN'])