Rewrite R glm in hail

Hi hail team!

I have a very basic question. I’m trying to rewrite the following R code (from Kaitlin’s MPC code) using hail:

glm(pop_v_path ~ obs_exp + mis_badness3 + obs_exp:mis_badness3 + polyphen2 + obs_exp:polyphen2, data=cleaned_joint_exac_clinvar.scores, family=binomial)

Is there an easy way to write this formula?


My R reading isn’t great: is this a linear regression? Or is this more complicated than that?

I think this is a logistic regression. I saw in the R docs that the : operator has a specific definition for glm:

A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second.

do you know if there is an equivalent for this in python?

Ok, so I think obs_exp:mis_badness3 in R is the same as obs_exp * mis_badness3 in Python. In an R formula, obs_exp * mis_badness3 would translate to the Python obs_exp + mis_badness3 + obs_exp * mis_badness3.

1 Like

thank you!!

hello! circling back to this, is there a way to do a logistic regression in hail? I think these two functions are the most relevant: Hail | Aggregators and Hail | Statistics.

I’m hoping to run a logistic regression in an aggregation (ideally something like hl.agg.logreg), is that possible with the existing functionality? Maybe I’ve missed something in the docs?

I’d appreciate any tips – thanks in advance!

This isn’t possible right now. Fitting a logistic regression a convex optimization problem, and there are no good options for doing this in a single pass over the data (which is what Hail aggregators require). We intend to support doing this on a table, but don’t have a timeline right now.

gotcha, thank you for letting me know!