Subset HailTable based on an array of indexes

Let’s say I have an array that stores a bunch of indexes: [0, 5, 12, 23, …] and I want to get the rows from a HailTable using it similar to how it can be done with numpy arrays, is there a way?

I don’t think we expose any efficient mechanism to index a matrix table by row index. You could use add_row_index or add_index (for a table) and then filter but that will scan all the rows.

Is there a reason you can’t filter by the key? Alternatively, if you want to filter by index, make that the key of the table with key_by or the key of the matrix table with key_rows_by.

add_index may work since its a one-time operation and that’s ok if it takes long. Would you say its the correct way of doing it:

index_set = set(np.array([0, 2, 8, 23], dtype=np.int64))
input_ht = input_ht.add_index(‘row_idx’)
selected_rows = input_ht.filter(hl.literal(index_set).contains(input_ht.row_idx))

Yeah, this is what I’d do.

1 Like