Table partitioning

Hi hail team!

I have a question about hail table partitioning. I started looking at a table I wrote a few weeks ago (using version 0.2.70) and realized I couldn’t do any quick checks on the table. For example, I tried to filter the table to a single transcript and show the top 10 rows, but my commands ran for at least one hour on my laptop without producing any output.

I realized that the slowness is due to the number of partitions in the table: the table only has 435 rows but was written out with 45,567 partitions (45,336 are empty). Is there a way to prevent writing out a large number of empty partitions? Is this a known issue/already fixed in a newer hail version?

Thanks!

We could add an argument to write that does adaptive repartitioning if the partitions are too small. That would be a good idea.

1 Like