Bgen and bgen index in different directories

jhchung · November 30, 2018, 8:47pm

Hi,

I have a bit of an unusual situation. Because of things that are out of my control, all of my input bgen and index files are in individual directories.

I’m wondering if it’s possible for the bgen and index files to be in different directories. I thought that the index_file_map could be the option but I’m having trouble getting things to work.

bgen_index_map = dict([("data/bgen/chr22.bgen", "data/index/chr22.bgen.idx")])

mt = hl.import_bgen(path = "data/bgen/chr22.bgen",
                    entry_fields = ["GP"],
                    index_file_map = bgen_index_map)

> 2018-11-30 20:22:22 Hail: WARN: BGEN file `data/bgen/chr22.bgen' contains no sample ID block and no sample ID file given.
>   Using _0, _1, ..., _N as sample IDs.
> 2018-11-30 20:22:22 Hail: INFO: Number of BGEN files parsed: 1
> 2018-11-30 20:22:22 Hail: INFO: Number of samples in BGEN files: 487409
> 2018-11-30 20:22:22 Hail: INFO: Number of variants across all BGEN files: 1255683

These commands run without error but if I try and do anything, I get a fatal error.

mt.s.show(5)

> FatalError: FileNotFoundException: data/bgen/chr22.bgen.idx

The error message makes it seems like hail still looks for the index file in the same directory as the bgen file.

Any help with this would be great.

Thanks

jigold · November 30, 2018, 9:04pm

You’re correct that the index_file_map parameter should support having index files in a different location than the BGEN data.

Could you please post the full stack trace?

jigold · November 30, 2018, 9:10pm

Also, how did you create your index files? The default extension for the most recent version of Hail is .idx2

jhchung · November 30, 2018, 9:44pm

Unfortunately, I am not using the latest version of Hail. The version I have available is build devel-f2b0dca9f506. I’m trying to get my informatics group to update the package.

Stack trace:

---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
<command-37581> in <module>()
----> 1 mt.s.show(5)

/databricks/spark/python/hail/typecheck/check.py in wrapper(*args, **kwargs)
    545         def wrapper(*args, **kwargs):
    546             args_, kwargs_ = check_all(f, args, kwargs, checkers, is_method=is_method)
--> 547             return f(*args_, **kwargs_)
    548 
    549         update_wrapper(wrapper, f)

/databricks/spark/python/hail/expr/expressions/base_expression.py in show(self, n, width, truncate, types)
    684             Print an extra header line with the type of each field.
    685         """
--> 686         print(self._show(n, width, truncate, types))
    687 
    688     def _show(self, n=10, width=90, truncate=None, types=True):

/databricks/spark/python/hail/expr/expressions/base_expression.py in _show(self, n, width, truncate, types)
    707         if source is not None:
    708             name = source._fields_inverse.get(self, name)
--> 709         t = self._to_table(name)
    710         if t.key is not None and name in t.key:

jigold · November 30, 2018, 10:01pm

Unfortunately, the feature for the index_file_map parameter was added in commit cf235511d2 on September 19th. Older versions assume the index file is in the same place as the BGEN file. Once you are able to upgrade your version, check out this blog post describing the changes. You will have to reindex your files as we changed the on-disk format.

jhchung · December 7, 2018, 7:12pm

Thanks for the information. Hopefully, we can get an updated version on our system.

Topic		Replies	Views
Import_bgen 16 bits error Hail Query & hailctl	1	340	April 14, 2020
Hail tutorials work, but otherwise hail does not import Help [0.1]	22	2609	January 18, 2018
py4j.protocol.Py4JError: is.hail.variant does not exist in the JVM Hail Query & hailctl	1	903	March 23, 2022
Jupyter/Windows Setup Help Hail Query & hailctl	1	491	March 13, 2019
Trouble Installing Hail: Different Hashes Hail Query & hailctl	2	371	July 14, 2020

Bgen and bgen index in different directories

Related Topics