Bgen and bgen index in different directories

jhchung · November 30, 2018, 8:47pm

Hi,

I have a bit of an unusual situation. Because of things that are out of my control, all of my input bgen and index files are in individual directories.

I’m wondering if it’s possible for the bgen and index files to be in different directories. I thought that the index_file_map could be the option but I’m having trouble getting things to work.

bgen_index_map = dict([("data/bgen/chr22.bgen", "data/index/chr22.bgen.idx")])

mt = hl.import_bgen(path = "data/bgen/chr22.bgen",
                    entry_fields = ["GP"],
                    index_file_map = bgen_index_map)

> 2018-11-30 20:22:22 Hail: WARN: BGEN file `data/bgen/chr22.bgen' contains no sample ID block and no sample ID file given.
>   Using _0, _1, ..., _N as sample IDs.
> 2018-11-30 20:22:22 Hail: INFO: Number of BGEN files parsed: 1
> 2018-11-30 20:22:22 Hail: INFO: Number of samples in BGEN files: 487409
> 2018-11-30 20:22:22 Hail: INFO: Number of variants across all BGEN files: 1255683

These commands run without error but if I try and do anything, I get a fatal error.

mt.s.show(5)

> FatalError: FileNotFoundException: data/bgen/chr22.bgen.idx

The error message makes it seems like hail still looks for the index file in the same directory as the bgen file.

Any help with this would be great.

Thanks

jigold · November 30, 2018, 9:04pm

You’re correct that the index_file_map parameter should support having index files in a different location than the BGEN data.

Could you please post the full stack trace?

jigold · November 30, 2018, 9:10pm

Also, how did you create your index files? The default extension for the most recent version of Hail is .idx2

jhchung · November 30, 2018, 9:44pm

Unfortunately, I am not using the latest version of Hail. The version I have available is build devel-f2b0dca9f506. I’m trying to get my informatics group to update the package.

Stack trace:

---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
<command-37581> in <module>()
----> 1 mt.s.show(5)

/databricks/spark/python/hail/typecheck/check.py in wrapper(*args, **kwargs)
    545         def wrapper(*args, **kwargs):
    546             args_, kwargs_ = check_all(f, args, kwargs, checkers, is_method=is_method)
--> 547             return f(*args_, **kwargs_)
    548 
    549         update_wrapper(wrapper, f)

/databricks/spark/python/hail/expr/expressions/base_expression.py in show(self, n, width, truncate, types)
    684             Print an extra header line with the type of each field.
    685         """
--> 686         print(self._show(n, width, truncate, types))
    687 
    688     def _show(self, n=10, width=90, truncate=None, types=True):

/databricks/spark/python/hail/expr/expressions/base_expression.py in _show(self, n, width, truncate, types)
    707         if source is not None:
    708             name = source._fields_inverse.get(self, name)
--> 709         t = self._to_table(name)
    710         if t.key is not None and name in t.key:

jigold · November 30, 2018, 10:01pm

Unfortunately, the feature for the index_file_map parameter was added in commit cf235511d2 on September 19th. Older versions assume the index file is in the same place as the BGEN file. Once you are able to upgrade your version, check out this blog post describing the changes. You will have to reindex your files as we changed the on-disk format.

jhchung · December 7, 2018, 7:12pm

Thanks for the information. Hopefully, we can get an updated version on our system.

Topic		Replies	Views
hail.java.FatalError: FileNotFoundException: ... .bgen.idx does not exist Help [0.1]	3	1124	November 10, 2017
[BreakingChange] Changes to index_bgen and import_bgen Updates	0	728	September 19, 2018
Generating index files vs. using pre-generated index for BGEN Hail Query & hailctl	7	1286	May 29, 2020
Error Indexing BGEN files Hail Query & hailctl	3	650	February 1, 2019
Index_bgen() on UKBB imputed data expected time Hail Query & hailctl	15	816	January 25, 2022

Bgen and bgen index in different directories

Related topics