[BreakingChange] Changes to index_bgen and import_bgen


#1

As of hash cf235511d2ee

index_bgen:

  • Changed the index file-format. You will need to rerun index_bgen in order to load BGEN files into Hail.
  • Added a new optional argument index_file_map which allows you to write the index files to a different location than where the BGEN files are stored. Be aware that the index file paths must end in .idx2
  • options for contig_recoding, skip_invalid_loci, and reference_genome were moved from import_bgen to index_bgen

import_bgen:

  • Removed arguments contig_recoding, skip_invalid_loci, and reference_genome. Use these options with index_bgen instead.
  • Added a new optional argument variants that allows you to specify either a Python list of variants (Struct with locus and alleles), a StructExpression with two fields – locus and alleles, or a Table that is keyed by locus and alleles. This can significantly improve performance when a pipeline does not need to look at all variants in the file.
  • Added a new optional argument index_file_map which allows you to specify which index file to use for a given BGEN input file. The default is to look for the index file having the same path name + “.idx2” in the directory the BGEN file is located.

Things to be aware of:

  • When loading multiple BGEN files with import_bgen, the argument for reference_genome when indexing all files must be identical. For example, you cannot index one file with GRCh37 and another with GRCh38 and then load both files at the same time.