hail.java.FatalError: FileNotFoundException: ... .bgen.idx does not exist


I’m getting an error and I don’t know why, as far as I know bgen files don’t require indexing before importing (and I’ve never seen a .bgen.idx), so can someone tell me what’s happening here?

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 2.1.0
SparkUI available at
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.1-6e815ac
2017-11-05 12:56:53 Hail: INFO: Number of BGEN files parsed: 23
2017-11-05 12:56:53 Hail: INFO: Number of samples in BGEN files: 7135
2017-11-05 12:56:53 Hail: INFO: Number of variants across all BGEN files: 40405505
Traceback (most recent call last):
  File "regression1.py", line 22, in <module>
    hc.import_bgen('/mnt/volume/imputed_genotypes/*.bgen', sample_file='/mnt/volume/imputed_genotypes/MT_chr1.sample').split_multi().write('/mnt/volume/imputed_genotypes/MT.vds')
  File "<decorator-gen-476>", line 2, in import_bgen
  File "/usr/local/hail/python/hail/java.py", line 121, in handle_py4j
    'Error summary: %s' % (deepest, full, Env.hc().version, deepest))
hail.java.FatalError: FileNotFoundException: File file:/mnt/volume/imputed_genotypes/MT_chr7.bgen.idx does not exist

The bgen files were created by converting imputed VCFs (output of Sanger imputation pipeline) to bgen using QCtool v2 rc6:

for i in {1..22} X
qctool2 -g /mnt/volume/imputed_genotypes/MT.vcfs/$i.vcf.gz -og MT_chr$i.bgen -os MT_chr$i.sample -assume-chromosome $i -vcf-genotype-field GP

Hail uses an index file that it creates in order to work with bgen files more efficiently. You should run index_bgen() first to create the .idx index file (only need to do this once & should do it separately - see doc), then import_bgen() will work.

As an aside, there’s also a new bgen indexing tool, bgenix, to work with bgen files in non-Hail contexts.

Thank you, I did miss the “(assuming it has already been indexed)”, and I searched for .idx and nothing popped up… obviously should have searched for index instead.

Yes, I know about bgenix, but the extension is different, so I figured Hail doesn’t use these.

Thanks for the documentation feedback @Stephane_Bourgeois, I’ve created three PRs to address the issues you faced:

