Hail matrix column to list

I want to extract all column value from hail matrix to a list. Is there a command to do so?
For example, the column of mt matrix is s. If I do list(mt.s), this will fail.

if you just want one column as a list, you can do mt.s.collect().

if you want multiple columns represented in the list, you can do mt.cols().collect(), which will give you a list of Structs, which you can then transform as desired.

for example,

from hail.utils import range_matrix_table

mt = range_matrix_table(10, 10)
mt = mt.annotate_cols(x=mt.col_idx**2, y=mt.col_idx/2)
cols = mt.cols().collect()

gets us this for the value of cols:

[Struct(col_idx=0, s=0.0, r=0.0),
 Struct(col_idx=1, s=1.0, r=0.5),
 Struct(col_idx=2, s=4.0, r=1.0),
 Struct(col_idx=3, s=9.0, r=1.5),
 Struct(col_idx=4, s=16.0, r=2.0),
 Struct(col_idx=5, s=25.0, r=2.5),
 Struct(col_idx=6, s=36.0, r=3.0),
 Struct(col_idx=7, s=49.0, r=3.5),
 Struct(col_idx=8, s=64.0, r=4.0),
 Struct(col_idx=9, s=81.0, r=4.5)]

and

[(entry.x, entry.y) for entry in cols]

produces this:

[(0.0, 0.0),
 (1.0, 0.5),
 (4.0, 1.0),
 (9.0, 1.5),
 (16.0, 2.0),
 (25.0, 2.5),
 (36.0, 3.0),
 (49.0, 3.5),
 (64.0, 4.0),
 (81.0, 4.5)]

where the first item of each tuple is from the x column, the second is from the y column, and the col_idx column has been omitted entirely.

1 Like