Hi All,
I’m processing VCF produced by Dragen pipeline with Hail. When using split_multi_hts() to split multi-allelic site variants, I get the error of index out of bounds like below.
HailUserError: Error summary: HailException: array index out of bounds: index=10, length=7
------------
Hail stack trace:
File "<ipython-input-5-6c3d12705311>", line 2, in <module>
mt = hl.split_multi_hts(mt)
File "/home/sonic/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/methods/statgen.py", line 2168, in split_multi_hts
(hl.range(0, 3).map(lambda i:
File "/home/sonic/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/methods/statgen.py", line 2172, in <lambda>
).map(lambda j: split.PL[j]))))))
File "/home/sonic/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/methods/statgen.py", line 2172, in <lambda>
).map(lambda j: split.PL[j]))))))
File "/home/sonic/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/expr/expressions/typed_expressions.py", line 481, in __getitem__
return self._method("indexArray", self.dtype.element_type, item)
File "/home/sonic/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/expr/expressions/base_expression.py", line 695, in _method
x = ir.Apply(name, ret_type, self._ir, *(a._ir for a in args))
File "/home/sonic/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/ir/ir.py", line 2628, in __init__
self.save_error_info()
I saw the similar case like me (Error index out of bounds). In this query, above error is due to PL fields. So I filtered calls which length of PL is 7 with following code, as the error message indicates.
tmp.filter_entries(tmp.PL.length() == 7).entries().show()
And I found those calls were from the sex chromosome of male samples, so they were all haploid calls.
In my VCF, haploid calls are presented as haploid genotypes. This could make trouble in split_multi_hts() function because the function assumes that all calls are presented as diploidy. I’ll have to look for more, but Dragen pipeline basically represent haploid calls this way when producing VCF, unlike GATK. In line with this, impute_sex() function also do not work. I think this could be a problem for people who use Hail to process Dragen-produced VCF.
So here is my question. Is there a way to change haploid genotypes to diploid? Or any way to handle non-diploid calls when applying Hail functions?