Hi,
I have 2 basic questions that came up while I want to sum allels over different samples.
- how can I convert vcf-like genotypes like 0/0,0/1,1/1 to 0,1,2 ? Then aggregating would be straight-forward.
alternatively
- Lets consider a phenotype table where I assign my samples to different groups. How could I do the aggregation (sum) of genotypes per group ?
Thanks for your help.
Thanks that already helped. For 1. I could do:
mtf = mt.select_entries(GT = mt.GT.n_alt_alleles())
mtf.make_table().export("data.tsv")
which works out fine.
Trying it similarly for 2. , I want to compute the statistic per variant. In principle, I get what I want when using:
mtf = mtf.annotate_entries(sumof_allels = mtf.GT.n_alt_alleles())
mtf=mtf.group_cols_by(mtf.pheno.ID).aggregate(allele_sum=hl.agg.sum(mtf.sumof_allels))
mtf.show()
however, when I want to generate the ht and do the export, like above, I get the following error:
ValueError Traceback (most recent call last)
/tmp/ipykernel_1422147/3525226286.py in <module>
4 mtf=mtf.group_cols_by(mtf.pheno.ID).aggregate(allele_sum=hl.agg.sum(mtf.sumof_allels))
5 mtf.show()
----> 6 ht=mtf.make_table()
<decorator-gen-1322> in make_table(self, separator)
~/anaconda3/lib/python3.9/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
575 def wrapper(__original_func, *args, **kwargs):
576 args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 577 return __original_func(*args_, **kwargs_)
578
579 return wrapper
~/anaconda3/lib/python3.9/site-packages/hail/matrixtable.py in make_table(self, separator)
4096 counts = Counter(col_keys)
4097 if counts[None] > 0:
-> 4098 raise ValueError("'make_table' encountered a missing column key; ensure all identifiers are defined.\n"
4099 " To fill in key index, run:\n"
4100 " mt = mt.key_cols_by(ck = hl.coalesce(mt.COL_KEY_NAME, 'missing_' + hl.str(hl.scan.count())))")
ValueError: 'make_table' encountered a missing column key; ensure all identifiers are defined.
To fill in key index, run:
mt = mt.key_cols_by(ck = hl.coalesce(mt.COL_KEY_NAME, 'missing_' + hl.str(hl.scan.count())))
I have already tried to solve it by using key_by, but did not yet succeed.
You could also try:
mtf.GT.export('data.tsv')
1 Like