As of today, there’s been a small change to the lmmreg function. Previously, a call to lmmreg looked like this:
lmm_vds = assoc_vds.lmmreg(kinship_vds, 'sa.pheno', ['sa.cov1', 'sa.cov2'])
Now, rather than requiring
kinship_vds, lmmreg requires an instance of a new python class called
KinshipMatrix. Currently, the only way in Hail to create a
KinshipMatrix is to call the new
rrm() (Realized Relationship Matrix) function on a vds. For example, to accomplish the same functionality as the above code, you’d do the following:
km = kinship_vds.rrm() lmm_vds = assoc_vds.lmmreg(km, 'sa.pheno', ['sa.cov1', 'sa.cov2'])
There are a few advantages to this change. The first is that if you want to run lmmreg several times with the same samples and variants but with a different phenotype and/or different covariates, you can reuse the same
KinshipMatrix. Hail will automatically filter out samples in the
KinshipMatrix which are missing the relevant annotations for the specified covariates or phenotypes when
lmmreg is called, just as it did with the old kinship_vds.
The second advantage is that unlike the old
lmmreg where you had to use the RRM to compute kinship, users are now free to try out different types of kinship matrix, like the GRM or IBD matrices. Currently, GRM and IBD don’t return a
KinshipMatrix, but that change will be pushed out shortly.
Finally, by calling
matrix() on the
KinshipMatrix class, you can get the Spark
IndexedRowMatrix that is backing the
KinshipMatrix and use any of the tools available in pyspark to analyze the matrix.
See the documentation for
lmmreg for more information.