Previously, some Spark heuristics chose a reasonable number of partitions for import_bgen
. This logic has been removed while we improve bgen performance (we may add the logic back in before 0.2 release). In the meantime, you will notice that import_bgen
uses far fewer partitions than is appropriate. You can use the min_partitions
argument to force a higher number of partitions. For the UKBB dataset, try using around 18,000 partitions.
1 Like