hail.java.FatalError: FileNotFoundException: ... .bgen.idx does not exist

Stephane_Bourgeois · November 5, 2017, 1:19pm

Hi,

I’m getting an error and I don’t know why, as far as I know bgen files don’t require indexing before importing (and I’ve never seen a .bgen.idx), so can someone tell me what’s happening here?

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 2.1.0
SparkUI available at http://10.2.213.83:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.1-6e815ac
2017-11-05 12:56:53 Hail: INFO: Number of BGEN files parsed: 23
2017-11-05 12:56:53 Hail: INFO: Number of samples in BGEN files: 7135
2017-11-05 12:56:53 Hail: INFO: Number of variants across all BGEN files: 40405505
Traceback (most recent call last):
  File "regression1.py", line 22, in <module>
    hc.import_bgen('/mnt/volume/imputed_genotypes/*.bgen', sample_file='/mnt/volume/imputed_genotypes/MT_chr1.sample').split_multi().write('/mnt/volume/imputed_genotypes/MT.vds')
  File "<decorator-gen-476>", line 2, in import_bgen
  File "/usr/local/hail/python/hail/java.py", line 121, in handle_py4j
    'Error summary: %s' % (deepest, full, Env.hc().version, deepest))
hail.java.FatalError: FileNotFoundException: File file:/mnt/volume/imputed_genotypes/MT_chr7.bgen.idx does not exist

The bgen files were created by converting imputed VCFs (output of Sanger imputation pipeline) to bgen using QCtool v2 rc6:

for i in {1..22} X
do
qctool2 -g /mnt/volume/imputed_genotypes/MT.vcfs/$i.vcf.gz -og MT_chr$i.bgen -os MT_chr$i.sample -assume-chromosome $i -vcf-genotype-field GP
done

maryhaas · November 5, 2017, 5:07pm

Hail uses an index file that it creates in order to work with bgen files more efficiently. You should run index_bgen() first to create the .idx index file (only need to do this once & should do it separately - see doc), then import_bgen() will work.

As an aside, there’s also a new bgen indexing tool, bgenix, to work with bgen files in non-Hail contexts.

Stephane_Bourgeois · November 6, 2017, 10:15am

Thank you, I did miss the “(assuming it has already been indexed)”, and I searched for .idx and nothing popped up… obviously should have searched for index instead.

Yes, I know about bgenix, but the extension is different, so I figured Hail doesn’t use these.

danking · November 10, 2017, 6:39pm

Thanks for the documentation feedback @Stephane_Bourgeois, I’ve created three PRs to address the issues you faced:

Topic		Replies	Views
Bgen and bgen index in different directories Hail Query & hailctl	5	835	December 7, 2018
Error Indexing BGEN files Hail Query & hailctl	3	647	February 1, 2019
Generating index files vs. using pre-generated index for BGEN Hail Query & hailctl	7	1265	May 29, 2020
Hail for GEL .bgen import in UKB RAP Hail Query & hailctl	0	32	May 26, 2025
[BreakingChange] Changes to index_bgen and import_bgen Updates	0	726	September 19, 2018

hail.java.FatalError: FileNotFoundException: ... .bgen.idx does not exist

Related topics