[Breaking Change] Hail 0.2 import_bgen should be passed a min_partitions parameter

danking · July 12, 2018, 12:46am

Previously, some Spark heuristics chose a reasonable number of partitions for import_bgen. This logic has been removed while we improve bgen performance (we may add the logic back in before 0.2 release). In the meantime, you will notice that import_bgen uses far fewer partitions than is appropriate. You can use the min_partitions argument to force a higher number of partitions. For the UKBB dataset, try using around 18,000 partitions.

Topic		Replies	Views
Importing large BGEN into Hail Matrix Table Hail Query & hailctl	4	512	July 2, 2021
Export bgen from VDS for 14 million variants and 414k samples of AoU Hail Query & hailctl	0	52	May 26, 2025
Subsetting a BGEN file in hail Hail Query & hailctl	1	76	October 4, 2024
Hail for GEL .bgen import in UKB RAP Hail Query & hailctl	0	65	May 26, 2025
Exporting a 20M variant x 400K sample MatrixTable to (ideally) BGEN format Hail Query & hailctl	2	557	November 20, 2019

[Breaking Change] Hail 0.2 import_bgen should be passed a min_partitions parameter

Related topics