I have a bit of an unusual situation. Because of things that are out of my control, all of my input bgen and index files are in individual directories.
I’m wondering if it’s possible for the bgen and index files to be in different directories. I thought that the index_file_map could be the option but I’m having trouble getting things to work.
bgen_index_map = dict([("data/bgen/chr22.bgen", "data/index/chr22.bgen.idx")])
mt = hl.import_bgen(path = "data/bgen/chr22.bgen",
entry_fields = ["GP"],
index_file_map = bgen_index_map)
> 2018-11-30 20:22:22 Hail: WARN: BGEN file `data/bgen/chr22.bgen' contains no sample ID block and no sample ID file given.
> Using _0, _1, ..., _N as sample IDs.
> 2018-11-30 20:22:22 Hail: INFO: Number of BGEN files parsed: 1
> 2018-11-30 20:22:22 Hail: INFO: Number of samples in BGEN files: 487409
> 2018-11-30 20:22:22 Hail: INFO: Number of variants across all BGEN files: 1255683
These commands run without error but if I try and do anything, I get a fatal error.
Unfortunately, I am not using the latest version of Hail. The version I have available is build devel-f2b0dca9f506. I’m trying to get my informatics group to update the package.
Stack trace:
---------------------------------------------------------------------------
FatalError Traceback (most recent call last)
<command-37581> in <module>()
----> 1 mt.s.show(5)
/databricks/spark/python/hail/typecheck/check.py in wrapper(*args, **kwargs)
545 def wrapper(*args, **kwargs):
546 args_, kwargs_ = check_all(f, args, kwargs, checkers, is_method=is_method)
--> 547 return f(*args_, **kwargs_)
548
549 update_wrapper(wrapper, f)
/databricks/spark/python/hail/expr/expressions/base_expression.py in show(self, n, width, truncate, types)
684 Print an extra header line with the type of each field.
685 """
--> 686 print(self._show(n, width, truncate, types))
687
688 def _show(self, n=10, width=90, truncate=None, types=True):
/databricks/spark/python/hail/expr/expressions/base_expression.py in _show(self, n, width, truncate, types)
707 if source is not None:
708 name = source._fields_inverse.get(self, name)
--> 709 t = self._to_table(name)
710 if t.key is not None and name in t.key:
Unfortunately, the feature for the index_file_map parameter was added in commit cf235511d2 on September 19th. Older versions assume the index file is in the same place as the BGEN file. Once you are able to upgrade your version, check out this blog post describing the changes. You will have to reindex your files as we changed the on-disk format.