As of today, there’s been a small change to the lmmreg function. Previously, a call to lmmreg looked like this:
lmm_vds = assoc_vds.lmmreg(kinship_vds, 'sa.pheno', ['sa.cov1', 'sa.cov2'])
Now, rather than requiring kinship_vds
, lmmreg requires an instance of a new python class called KinshipMatrix
. Currently, the only way in Hail to create a KinshipMatrix
is to call the new rrm()
(Realized Relationship Matrix) function on a vds. For example, to accomplish the same functionality as the above code, you’d do the following:
km = kinship_vds.rrm()
lmm_vds = assoc_vds.lmmreg(km, 'sa.pheno', ['sa.cov1', 'sa.cov2'])
There are a few advantages to this change. The first is that if you want to run lmmreg several times with the same samples and variants but with a different phenotype and/or different covariates, you can reuse the same KinshipMatrix
. Hail will automatically filter out samples in the KinshipMatrix
which are missing the relevant annotations for the specified covariates or phenotypes when lmmreg
is called, just as it did with the old kinship_vds.
The second advantage is that unlike the old lmmreg
where you had to use the RRM to compute kinship, users are now free to try out different types of kinship matrix, like the GRM or IBD matrices. Currently, GRM and IBD don’t return a KinshipMatrix
, but that change will be pushed out shortly.
Finally, by calling matrix()
on the KinshipMatrix
class, you can get the Spark IndexedRowMatrix
that is backing the KinshipMatrix
and use any of the tools available in pyspark to analyze the matrix.
See the documentation for lmmreg
for more information.