Adding sample labels to a relationship matrix

JanErik · April 13, 2019, 8:36pm

I have created a hail matrix table from a vcf with 591 samples. I have had good success with using the hail.realized_relationship_matrix. It would be helpful, however, to have row and column labels for the resulting matrix. The solution I came up with was to convert the block matrix to an ndarray and then convert the ndarray to a panda using a list of samples as row and column names:

rrm = hl.realized_relationship_matrix(mt.GT)
rrm_npy = rrm.to_numpy()
samples = mt.s.collect()
rrm_panda = pd.DataFrame(rrm_npy, index=samples, columns=samples)

My question: does this seem like a robust solution? What’s opaque to me is whether the block matrix indices are bound to match the indices of the array created by mt.s.collect().

My kudos to the hail team – it is awesome.

tpoterba · April 14, 2019, 12:39am

This is a topic that has come up before, I think. I suppose the answer will depend on what you want to do downstream. Your code looks fine, but it won’t scale, and interconverting between Hail objects and python objects can be very slow.

One of the natural things to do may be to convert it to a MatrixTable using this method.

Once it is again a matrix table, you can put the keys back in:

rrm = hl.realized_relationship_matrix(mt.GT)
rrm_mt = rrm.to_matrix_table_row_major()

sample_ids = hl.literal(mt.s.collect())
rrm_mt = rrm_mt.key_rows_by(s1 = sample_ids[rrm_mt.row_idx])
rrm_mt = rrm_mt.key_cols_by(s2 = sample_ids[rrm_mt.col_idx])

JanErik · April 14, 2019, 2:59am

Cool.

To get it to work for me, I needed to cast the MatrixTable indices to int32:

rrm_mt = rrm_mt.key_rows_by(s1 = sample_ids[hl.int32(rrm_mt.row_idx)])
rrm_mt = rrm_mt.key_cols_by(s2 = sample_ids[hl.int32(rrm_mt.col_idx)])

tpoterba · April 14, 2019, 1:55pm

ah, yes! That always comes up, and it’s a bit annoying but better than either doing an unsafe cast or an expensive check automatically.

Topic		Replies	Views
Realized_relationship_matrix Hail Query & hailctl	11	479	November 21, 2020
Realized_relationship_matrix performance Hail Query & hailctl	5	524	July 31, 2020
Select certain samples from MatrixTable Hail Query & hailctl	9	821	October 6, 2022
Can Hail convert mt.show() output to dataframe Hail Query & hailctl	3	428	January 13, 2023
Extracing sample IDs into a Python List Hail Query & hailctl	0	31	August 8, 2024

Adding sample labels to a relationship matrix

Related topics