When I import a vcf file, and annotate each sample with population, age, gender and so on. Then how to stat each variant AC accross population, e.g. for variant1, AC of AFR, AMR and ASJ(The expected results showed below). I think it’s not efficient to filter each population and then calculate.
AC
AFR AMR ASJ …
variant1 10,2 5,5 3,20
variant2 2,32 1,5 5,10
I think it’s not efficient to filter each population and then calculate
This should be fine:
mt = mt.annotate_rows(
callstats_afr = hl.agg.call_stats(hl.agg.filter(mt.pop == 'AFR', mt.GT), mt.alleles),
callstats_eur = hl.agg.call_stats(hl.agg.filter(mt.pop == 'EUR', mt.GT), mt.alleles),
...)
You can also use the experimental aggregator group_by functionality:
mt = mt.annotate_rows(call_stats = hl.agg.group_by(mt.pop, hl.agg.call_stats(mt.GT, mt.alleles)))
Thanks. I have tried it on a test vcf, the remain question is that it cannot get/calculate the heterozygosis genotype counts when the allele number is over than 3
you can compute this separately:
mt = mt.annotate_rows(het_count = hl.agg.count_where(mt.GT.is_het()))
If you want to filter to AN > 3:
mt = mt.filter_rows(mt.call_stats.AN > 3) # or what ever population
Maybe I haven’t express clear. For example there are four alleles in a site, ‘A’ for ref, and [‘G’,‘C’,‘T’] for alternative bases, the homo-genotype counts can be cal by call_stats function, then how can I get the follow genotypes counts, AG, AC, AT …? Expect your reply, thanks
annotate_rows(GC = hl.agg.counter(mt.GT))
should do it.
Similar to this, I tried call_stats function for case and control stats but I am getting this assertion error. Any suggestions?
/restricted/projectnb/ukbiobank/ad/analysis/ad.v1/ad_parent_apoe_gt.py in <module>()
73 mt = mt.filter_rows(mt.variant_qc.AF[0] <= (1.0-maf_cutoff))
74 mt = mt.annotate_rows(
---> 75 case = hl.agg.call_stats(hl.agg.filter(mt.pheno.parent_AD_binary == 1, mt.GT), mt.alleles),
76 control = hl.agg.call_stats(hl.agg.filter(mt.pheno.parent_AD_binary == 0, mt.GT), mt.alleles)
77 )
<decorator-gen-995> in filter(condition, aggregation)
/restricted/projectnb/genpro/github/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
557 @decorator
558 def wrapper(__original_func, *args, **kwargs):
--> 559 args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
560 return __original_func(*args_, **kwargs_)
561
/restricted/projectnb/genpro/github/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py in check_all(f, args, kwargs, checks, is_method)
485 if i < len(args):
486 arg = args[i]
--> 487 args_.append(checker.check(arg, name, arg_name))
488 # passed as keyword
489 else:
/restricted/projectnb/genpro/github/hail/hail/build/distributions/hail-python.zip/hail/expr/aggregators/aggregators.py in check(self, x, caller, param)
24 def check(self, x, caller, param):
25 x = self.coercer.check(x, caller, param)
---> 26 assert(len(x._ir.search(lambda node: isinstance(node, BaseApplyAggOp))) != 0)
27 return x
28
AssertionError:
ah, still need to fix that error message.
See here: