How to stat AC for each population


When I import a vcf file, and annotate each sample with population, age, gender and so on. Then how to stat each variant AC accross population, e.g. for variant1, AC of AFR, AMR and ASJ(The expected results showed below). I think it’s not efficient to filter each population and then calculate.
variant1 10,2 5,5 3,20
variant2 2,32 1,5 5,10


I think it’s not efficient to filter each population and then calculate

This should be fine:

mt = mt.annotate_rows(
    callstats_afr = hl.agg.call_stats(hl.agg.filter(mt.pop == 'AFR', mt.GT), mt.alleles),
    callstats_eur = hl.agg.call_stats(hl.agg.filter(mt.pop == 'EUR', mt.GT), mt.alleles),

You can also use the experimental aggregator group_by functionality:

mt = mt.annotate_rows(call_stats = hl.agg.group_by(mt.pop, hl.agg.call_stats(mt.GT, mt.alleles)))


Thanks. I have tried it on a test vcf, the remain question is that it cannot get/calculate the heterozygosis genotype counts when the allele number is over than 3


you can compute this separately:

mt = mt.annotate_rows(het_count = hl.agg.count_where(mt.GT.is_het()))

If you want to filter to AN > 3:

mt = mt.filter_rows(mt.call_stats.AN > 3) # or what ever population


Maybe I haven’t express clear. For example there are four alleles in a site, ‘A’ for ref, and [‘G’,‘C’,‘T’] for alternative bases, the homo-genotype counts can be cal by call_stats function, then how can I get the follow genotypes counts, AG, AC, AT …? Expect your reply, thanks


annotate_rows(GC = hl.agg.counter(mt.GT)) should do it.


Similar to this, I tried call_stats function for case and control stats but I am getting this assertion error. Any suggestions?

/restricted/projectnb/ukbiobank/ad/analysis/ad.v1/ in <module>()
 73 mt = mt.filter_rows(mt.variant_qc.AF[0] <= (1.0-maf_cutoff))
 74 mt = mt.annotate_rows(
---> 75      case    = hl.agg.call_stats(hl.agg.filter(mt.pheno.parent_AD_binary == 1, mt.GT), mt.alleles),
 76      control = hl.agg.call_stats(hl.agg.filter(mt.pheno.parent_AD_binary == 0, mt.GT), mt.alleles)
 77     )

<decorator-gen-995> in filter(condition, aggregation)

/restricted/projectnb/genpro/github/hail/hail/build/distributions/ in wrapper(__original_func, *args, **kwargs)
    557     @decorator
    558     def wrapper(__original_func, *args, **kwargs):
--> 559         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
    560         return __original_func(*args_, **kwargs_)

/restricted/projectnb/genpro/github/hail/hail/build/distributions/ in check_all(f, args, kwargs, checks, is_method)
    485                     if i < len(args):
    486                         arg = args[i]
--> 487                         args_.append(checker.check(arg, name, arg_name))
    488                     # passed as keyword
    489                     else:

/restricted/projectnb/genpro/github/hail/hail/build/distributions/ in check(self, x, caller, param)
     24     def check(self, x, caller, param):
     25         x = self.coercer.check(x, caller, param)
---> 26         assert(len( node: isinstance(node, BaseApplyAggOp))) != 0)
     27         return x



ah, still need to fix that error message.

See here: