I have created a hail matrix table from a vcf with 591 samples. I have had good success with using the hail.realized_relationship_matrix. It would be helpful, however, to have row and column labels for the resulting matrix. The solution I came up with was to convert the block matrix to an ndarray and then convert the ndarray to a panda using a list of samples as row and column names:
rrm = hl.realized_relationship_matrix(mt.GT)
rrm_npy = rrm.to_numpy()
samples = mt.s.collect()
rrm_panda = pd.DataFrame(rrm_npy, index=samples, columns=samples)
My question: does this seem like a robust solution? What’s opaque to me is whether the block matrix indices are bound to match the indices of the array created by mt.s.collect().
My kudos to the hail team – it is awesome.