Hi,
I’m getting an error and I don’t know why, as far as I know bgen files don’t require indexing before importing (and I’ve never seen a .bgen.idx), so can someone tell me what’s happening here?
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 2.1.0
SparkUI available at http://10.2.213.83:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.1-6e815ac
2017-11-05 12:56:53 Hail: INFO: Number of BGEN files parsed: 23
2017-11-05 12:56:53 Hail: INFO: Number of samples in BGEN files: 7135
2017-11-05 12:56:53 Hail: INFO: Number of variants across all BGEN files: 40405505
Traceback (most recent call last):
File "regression1.py", line 22, in <module>
hc.import_bgen('/mnt/volume/imputed_genotypes/*.bgen', sample_file='/mnt/volume/imputed_genotypes/MT_chr1.sample').split_multi().write('/mnt/volume/imputed_genotypes/MT.vds')
File "<decorator-gen-476>", line 2, in import_bgen
File "/usr/local/hail/python/hail/java.py", line 121, in handle_py4j
'Error summary: %s' % (deepest, full, Env.hc().version, deepest))
hail.java.FatalError: FileNotFoundException: File file:/mnt/volume/imputed_genotypes/MT_chr7.bgen.idx does not exist
The bgen files were created by converting imputed VCFs (output of Sanger imputation pipeline) to bgen using QCtool v2 rc6:
for i in {1..22} X
do
qctool2 -g /mnt/volume/imputed_genotypes/MT.vcfs/$i.vcf.gz -og MT_chr$i.bgen -os MT_chr$i.sample -assume-chromosome $i -vcf-genotype-field GP
done