Shuffle behavior with/without hl.eval

Hello Hail team, I ran into some behavior I found unexpected when using hl.shuffle and was hoping you could shed some light on what I am seeing. Essentially, when I try to shuffle an array and take a subset of the permutation to filter a table, I get an unexpected result if I do not use hl.eval on the permutation subset before filtering. Here is a simple example to demonstrate:

> ht = hl.utils.range_table(100)
> idx_permut = hl.shuffle(hl.range(100))
> ht.filter(hl.set(idx_permut[:10]).contains(ht.idx)).show()
+-------+>
|   idx |
+-------+
| int32 |
+-------+
|    52 |
|    66 |
|    67 |
|    70 |
|    75 |
|    77 |
|    83 |
+-------+
> ht.filter(hl.set(hl.eval(idx_permut[:10])).contains(ht.idx)).show()
+-------+
|   idx |
+-------+
| int32 |
+-------+
|    21 |
|    27 |
|    42 |
|    45 |
|    60 |
|    67 |
|    81 |
|    93 |
|    94 |
|    95 |
+-------+

Without hl.eval, the number of filtered rows doesn’t match the purported size of the subset and it also does not contain the same elements as with hl.eval. Could you help me understand what is going on here? Does it have to do with lazy evaluation?

1 Like

Does it have to do with lazy evaluation?

Yes.

This:

> idx_permut = hl.shuffle(hl.range(100))
> ht.filter(hl.set(idx_permut[:10]).contains(ht.idx)).show()

is exactly the same as:

> ht.filter(hl.set( hl.shuffle(hl.range(100))[:10]).contains(ht.idx)).show()

If you want to use a variable in a Hail expression in a way that the value is constant (rather than being evaluated every rows/cols/entry) then you can use either hl.literal to make it a “literal expression”, or annotate it into table globals with ht = ht.annotate_globals(idx_permut=...) and use it as ht.idx_permute in row computations.

Also, please feel heartened that this is one of the most complicated areas of the Hail interface – the development team has argued about this for years and roughly concluded that there are other designs that might make your example “predictable”, but would make others more confusing.

1 Like