Hello Hail team, I ran into some behavior I found unexpected when using hl.shuffle and was hoping you could shed some light on what I am seeing. Essentially, when I try to shuffle an array and take a subset of the permutation to filter a table, I get an unexpected result if I do not use hl.eval on the permutation subset before filtering. Here is a simple example to demonstrate:
> ht = hl.utils.range_table(100)
> idx_permut = hl.shuffle(hl.range(100))
> ht.filter(hl.set(idx_permut[:10]).contains(ht.idx)).show()
+-------+>
| idx |
+-------+
| int32 |
+-------+
| 52 |
| 66 |
| 67 |
| 70 |
| 75 |
| 77 |
| 83 |
+-------+
> ht.filter(hl.set(hl.eval(idx_permut[:10])).contains(ht.idx)).show()
+-------+
| idx |
+-------+
| int32 |
+-------+
| 21 |
| 27 |
| 42 |
| 45 |
| 60 |
| 67 |
| 81 |
| 93 |
| 94 |
| 95 |
+-------+
Without hl.eval, the number of filtered rows doesn’t match the purported size of the subset and it also does not contain the same elements as with hl.eval. Could you help me understand what is going on here? Does it have to do with lazy evaluation?