[Breaking change] Lmmreg Changes + Kinship Matrix

As of today, there’s been a small change to the lmmreg function. Previously, a call to lmmreg looked like this:

lmm_vds = assoc_vds.lmmreg(kinship_vds, 'sa.pheno', ['sa.cov1', 'sa.cov2'])

Now, rather than requiring kinship_vds, lmmreg requires an instance of a new python class called KinshipMatrix. Currently, the only way in Hail to create a KinshipMatrix is to call the new rrm() (Realized Relationship Matrix) function on a vds. For example, to accomplish the same functionality as the above code, you’d do the following:

km = kinship_vds.rrm()
lmm_vds = assoc_vds.lmmreg(km, 'sa.pheno', ['sa.cov1', 'sa.cov2'])

There are a few advantages to this change. The first is that if you want to run lmmreg several times with the same samples and variants but with a different phenotype and/or different covariates, you can reuse the same KinshipMatrix. Hail will automatically filter out samples in the KinshipMatrix which are missing the relevant annotations for the specified covariates or phenotypes when lmmreg is called, just as it did with the old kinship_vds.

The second advantage is that unlike the old lmmreg where you had to use the RRM to compute kinship, users are now free to try out different types of kinship matrix, like the GRM or IBD matrices. Currently, GRM and IBD don’t return a KinshipMatrix, but that change will be pushed out shortly.

Finally, by calling matrix() on the KinshipMatrix class, you can get the Spark IndexedRowMatrix that is backing the KinshipMatrix and use any of the tools available in pyspark to analyze the matrix.

See the documentation for lmmreg for more information.