[Breaking Change] Hail 0.2 import_bgen should be passed a min_partitions parameter

Previously, some Spark heuristics chose a reasonable number of partitions for import_bgen. This logic has been removed while we improve bgen performance (we may add the logic back in before 0.2 release). In the meantime, you will notice that import_bgen uses far fewer partitions than is appropriate. You can use the min_partitions argument to force a higher number of partitions. For the UKBB dataset, try using around 18,000 partitions.

1 Like