Dear hail team,
I hope this question has not been posted yet, I can’t find the answer at least. I am trying to get familiar with hail by filtering and doing simple stuff I usually do on VCFs with softwares like bcftools.
I am working with Hail version 0.2.57-582b2e31b8bd
I splitted my VCF from multiallelic to biallelic with:
data_tmp_bi = hl.split_multi_hts(data_tmp)
Then I want to update the allele counts the same way as what I saw in your documentation:
data_tmp_bi = data_tmp_bi.annotate_rows(info = hl.struct(AC=data_tmp_bi.info.AC[data_tmp_bi.a_index - 1],**data_tmp_bi.info))
but I get this error:
File "<ipython-input-29-3595a23add68>", line 1, in <module>
data_tmp_bi = data_tmp_bi.annotate_rows(info = hl.struct(AC=data_tmp_bi.info.AC[data_tmp_bi.a_index - 1],**data_tmp_bi.info))
TypeError: struct() got multiple values for keyword argument 'AC'
The workaround I have been using is then to create a new info field called ‘AC2’, to then drop the ‘AC’ field and then recreate the ‘AC’ field with annotate_rows
with to finally drop ‘AC2’. Which is a long workaround:
data_tmp_bi = data_tmp_bi.annotate_rows(info = hl.struct(AC2=data_tmp_bi.info.AC[data_tmp_bi.a_index - 1],**data_tmp_bi.info))
data_tmp_bi = data_tmp_bi.annotate_rows(info=data_tmp_bi.info.drop('AC'))
data_tmp_bi = data_tmp_bi.annotate_rows(info = hl.struct(AC=data_tmp_bi.info.AC2, **data_tmp_bi.info))
data_tmp_bi = data_tmp_bi.annotate_rows(info=data_tmp_bi.info.drop('AC2'))
On top of this, I filter by column some samples with filter_cols
, so then, I want to update fields like allele count again, so I would still need to use the same workaround as above, otherwise I get the same error.
Do you have an idea of what the problem might be? Or a better way of doing this than what I am using?