[BreakingChange] Changes to index_bgen and import_bgen


#1

As of hash cf235511d2ee

index_bgen:

  • Changed the index file-format. You will need to rerun index_bgen in order to load BGEN files into Hail.
  • Added a new optional argument index_file_map which allows you to write the index files to a different location than where the BGEN files are stored. Be aware that the index file paths must end in .idx2
  • options for contig_recoding, skip_invalid_loci, and reference_genome were moved from import_bgen to index_bgen

import_bgen:

  • Removed arguments contig_recoding, skip_invalid_loci, and reference_genome. Use these options with index_bgen instead.
  • Added a new optional argument variants that allows you to specify either a Python list of variants (Struct with locus and alleles), a StructExpression with two fields – locus and alleles, or a Table that is keyed by locus and alleles. This can significantly improve performance when a pipeline does not need to look at all variants in the file.
  • Added a new optional argument index_file_map which allows you to specify which index file to use for a given BGEN input file. The default is to look for the index file having the same path name + “.idx2” in the directory the BGEN file is located.

Things to be aware of:

  • When loading multiple BGEN files with import_bgen, the argument for reference_genome when indexing all files must be identical. For example, you cannot index one file with GRCh37 and another with GRCh38 and then load both files at the same time.

Bgen and bgen index in different directories