I want to extract all column value from hail matrix to a list. Is there a command to do so?
For example, the column of mt matrix is s. If I do list(mt.s), this will fail.
if you just want one column as a list, you can do mt.s.collect()
.
if you want multiple columns represented in the list, you can do mt.cols().collect()
, which will give you a list of Struct
s, which you can then transform as desired.
for example,
from hail.utils import range_matrix_table
mt = range_matrix_table(10, 10)
mt = mt.annotate_cols(x=mt.col_idx**2, y=mt.col_idx/2)
cols = mt.cols().collect()
gets us this for the value of cols
:
[Struct(col_idx=0, s=0.0, r=0.0),
Struct(col_idx=1, s=1.0, r=0.5),
Struct(col_idx=2, s=4.0, r=1.0),
Struct(col_idx=3, s=9.0, r=1.5),
Struct(col_idx=4, s=16.0, r=2.0),
Struct(col_idx=5, s=25.0, r=2.5),
Struct(col_idx=6, s=36.0, r=3.0),
Struct(col_idx=7, s=49.0, r=3.5),
Struct(col_idx=8, s=64.0, r=4.0),
Struct(col_idx=9, s=81.0, r=4.5)]
and
[(entry.x, entry.y) for entry in cols]
produces this:
[(0.0, 0.0),
(1.0, 0.5),
(4.0, 1.0),
(9.0, 1.5),
(16.0, 2.0),
(25.0, 2.5),
(36.0, 3.0),
(49.0, 3.5),
(64.0, 4.0),
(81.0, 4.5)]
where the first item of each tuple is from the x
column, the second is from the y
column, and the col_idx
column has been omitted entirely.
1 Like