Best recommended way to randomly permute columns of a Table or MatrixTable?


I’d like to ask what would be the best way to randomly permute the columns of a Table or a MatrixTable? Is there a way similar to running df.columns = new_list as in pandas?


You can permute the columns of a matrix table with choose_cols.

Tables have fields, not columns, in Hail speak. We don’t yet have a nice way to turn a table into a matrix table row by row. You could create a single array field from many fields in the table, and then annotate_entries on a matrix table with the same row keys using the column index as array index. Alternatively you could export the table to TSV and import_matrix_table.

Just to clarify, choose_cols will permute both the entry and column-indexed fields. I’m not sure if this is specifically what you’re trying to do.

I’m trying to obtain a set of null stats for linear_regression by permuting the genotypes - but since it’ll permute both the entry and column-indexed fields, I’m guessing it’ll not yield the permuted results after all?

Agreed. Could you just annotate with a bunch of permutations of the phenotype, leaving the genotypes in place?

that’ll be way better. Extract phenotypes to a Python list, shuffle it, hl.literal it, annotate using column index.