Hello Hail team, I ran into some behavior I found unexpected when using hl.shuffle
and was hoping you could shed some light on what I am seeing. Essentially, when I try to shuffle an array and take a subset of the permutation to filter a table, I get an unexpected result if I do not use hl.eval
on the permutation subset before filtering. Here is a simple example to demonstrate:
> ht = hl.utils.range_table(100)
> idx_permut = hl.shuffle(hl.range(100))
> ht.filter(hl.set(idx_permut[:10]).contains(ht.idx)).show()
+-------+>
| idx |
+-------+
| int32 |
+-------+
| 52 |
| 66 |
| 67 |
| 70 |
| 75 |
| 77 |
| 83 |
+-------+
> ht.filter(hl.set(hl.eval(idx_permut[:10])).contains(ht.idx)).show()
+-------+
| idx |
+-------+
| int32 |
+-------+
| 21 |
| 27 |
| 42 |
| 45 |
| 60 |
| 67 |
| 81 |
| 93 |
| 94 |
| 95 |
+-------+
Without hl.eval
, the number of filtered rows doesn’t match the purported size of the subset and it also does not contain the same elements as with hl.eval
. Could you help me understand what is going on here? Does it have to do with lazy evaluation?