[BreakingChange] Changes to index_bgen and import_bgen

jigold · September 19, 2018, 8:16pm

As of hash cf235511d2ee

index_bgen:

Changed the index file-format. You will need to rerun index_bgen in order to load BGEN files into Hail.
Added a new optional argument index_file_map which allows you to write the index files to a different location than where the BGEN files are stored. Be aware that the index file paths must end in .idx2
options for contig_recoding, skip_invalid_loci, and reference_genome were moved from import_bgen to index_bgen

import_bgen:

Removed arguments contig_recoding, skip_invalid_loci, and reference_genome. Use these options with index_bgen instead.
Added a new optional argument variants that allows you to specify either a Python list of variants (Struct with locus and alleles), a StructExpression with two fields – locus and alleles, or a Table that is keyed by locus and alleles. This can significantly improve performance when a pipeline does not need to look at all variants in the file.
Added a new optional argument index_file_map which allows you to specify which index file to use for a given BGEN input file. The default is to look for the index file having the same path name + “.idx2” in the directory the BGEN file is located.

Things to be aware of:

When loading multiple BGEN files with import_bgen, the argument for reference_genome when indexing all files must be identical. For example, you cannot index one file with GRCh37 and another with GRCh38 and then load both files at the same time.

Topic		Replies	Views
Bgen and bgen index in different directories Hail Query & hailctl	5	835	December 7, 2018
Generating index files vs. using pre-generated index for BGEN Hail Query & hailctl	7	1265	May 29, 2020
hail.java.FatalError: FileNotFoundException: ... .bgen.idx does not exist Help [0.1]	3	1120	November 10, 2017
Loading genotypes error Hail Query & hailctl	2	303	April 14, 2022
Support importing phased BGEN files Feature Requests	0	271	August 22, 2023