I have a bit of an unusual situation. Because of things that are out of my control, all of my input bgen and index files are in individual directories.
I’m wondering if it’s possible for the bgen and index files to be in different directories. I thought that the
index_file_map could be the option but I’m having trouble getting things to work.
bgen_index_map = dict([("data/bgen/chr22.bgen", "data/index/chr22.bgen.idx")])
mt = hl.import_bgen(path = "data/bgen/chr22.bgen",
entry_fields = ["GP"],
index_file_map = bgen_index_map)
> 2018-11-30 20:22:22 Hail: WARN: BGEN file `data/bgen/chr22.bgen' contains no sample ID block and no sample ID file given.
> Using _0, _1, ..., _N as sample IDs.
> 2018-11-30 20:22:22 Hail: INFO: Number of BGEN files parsed: 1
> 2018-11-30 20:22:22 Hail: INFO: Number of samples in BGEN files: 487409
> 2018-11-30 20:22:22 Hail: INFO: Number of variants across all BGEN files: 1255683
These commands run without error but if I try and do anything, I get a fatal error.
> FatalError: FileNotFoundException: data/bgen/chr22.bgen.idx
The error message makes it seems like hail still looks for the index file in the same directory as the bgen file.
Any help with this would be great.
You’re correct that the
index_file_map parameter should support having index files in a different location than the BGEN data.
Could you please post the full stack trace?
Also, how did you create your index files? The default extension for the most recent version of Hail is
Unfortunately, I am not using the latest version of Hail. The version I have available is build
devel-f2b0dca9f506. I’m trying to get my informatics group to update the package.
FatalError Traceback (most recent call last)
<command-37581> in <module>()
----> 1 mt.s.show(5)
/databricks/spark/python/hail/typecheck/check.py in wrapper(*args, **kwargs)
545 def wrapper(*args, **kwargs):
546 args_, kwargs_ = check_all(f, args, kwargs, checkers, is_method=is_method)
--> 547 return f(*args_, **kwargs_)
549 update_wrapper(wrapper, f)
/databricks/spark/python/hail/expr/expressions/base_expression.py in show(self, n, width, truncate, types)
684 Print an extra header line with the type of each field.
--> 686 print(self._show(n, width, truncate, types))
688 def _show(self, n=10, width=90, truncate=None, types=True):
/databricks/spark/python/hail/expr/expressions/base_expression.py in _show(self, n, width, truncate, types)
707 if source is not None:
708 name = source._fields_inverse.get(self, name)
--> 709 t = self._to_table(name)
710 if t.key is not None and name in t.key:
Unfortunately, the feature for the
index_file_map parameter was added in commit
cf235511d2 on September 19th. Older versions assume the index file is in the same place as the BGEN file. Once you are able to upgrade your version, check out this blog post describing the changes. You will have to reindex your files as we changed the on-disk format.
Thanks for the information. Hopefully, we can get an updated version on our system.