Linear/Logistic regression covariates

acererak · August 18, 2020, 4:53pm

I have a question about the accepted/most efficient type-casting to format the covariates in a linear or logistic regression.

I see that covariates accepts a list of Float64Expression.

I have a mixture of int/float covariates and categorical (non-dichotomous) covariates. I have converted each categorical trait into a series of dichotomous dummy variables.

Is this the best way to handle these variables? Or is it better to pass them as an int where each integer is a different factor? Or is it possible to pass these variables in the original string format?

For one of my runs I got a “Error summary: HailException: Failed to fit logistic regression null model (standard MLE with covariates only): exploded at Newton iteration 10” error, and I’m worried the increase in the # of covariates due to the expansion of categorical variables into several dummy variables was a factor in the regression not converging.

tpoterba · August 18, 2020, 11:19pm

We intend to build a nicer interface for categorical/factor variables at some point, but dummy-coding is the correct thing to do for now.

With respect to the newton iteration exploding, I’m guessing that this means your phenotype is perfectly separable (predictable) from covariates alone, so there’s no additional signal in the genotypes.

Topic		Replies	Views
How do I include a categorical variable as a covariate in my logistic or linear regression? Hail Query & hailctl	4	1097	November 23, 2020
Categorical covariates in association analyses Feature Requests	3	1086	February 20, 2019
Association analysis covariate Hail Query & hailctl	1	616	April 13, 2018
Logistic regression implementation Hail Query & hailctl	4	809	September 23, 2020
Programatically define covariates Hail Query & hailctl	2	498	May 3, 2019

Linear/Logistic regression covariates

Related topics