Linear/Logistic regression covariates

I have a question about the accepted/most efficient type-casting to format the covariates in a linear or logistic regression.

I see that covariates accepts a list of Float64Expression.

I have a mixture of int/float covariates and categorical (non-dichotomous) covariates. I have converted each categorical trait into a series of dichotomous dummy variables.

Is this the best way to handle these variables? Or is it better to pass them as an int where each integer is a different factor? Or is it possible to pass these variables in the original string format?

For one of my runs I got a “Error summary: HailException: Failed to fit logistic regression null model (standard MLE with covariates only): exploded at Newton iteration 10” error, and I’m worried the increase in the # of covariates due to the expansion of categorical variables into several dummy variables was a factor in the regression not converging.

We intend to build a nicer interface for categorical/factor variables at some point, but dummy-coding is the correct thing to do for now.

With respect to the newton iteration exploding, I’m guessing that this means your phenotype is perfectly separable (predictable) from covariates alone, so there’s no additional signal in the genotypes.

1 Like