How to stat AC for each population


#1

When I import a vcf file, and annotate each sample with population, age, gender and so on. Then how to stat each variant AC accross population, e.g. for variant1, AC of AFR, AMR and ASJ(The expected results showed below). I think it’s not efficient to filter each population and then calculate.
AC
AFR AMR ASJ …
variant1 10,2 5,5 3,20
variant2 2,32 1,5 5,10


#2

I think it’s not efficient to filter each population and then calculate

This should be fine:

mt = mt.annotate_rows(
    callstats_afr = hl.agg.call_stats(hl.agg.filter(mt.pop == 'AFR', mt.GT), mt.alleles),
    callstats_eur = hl.agg.call_stats(hl.agg.filter(mt.pop == 'EUR', mt.GT), mt.alleles),
...)

You can also use the experimental aggregator group_by functionality:

mt = mt.annotate_rows(call_stats = hl.agg.group_by(mt.pop, hl.agg.call_stats(mt.GT, mt.alleles)))

#3

Thanks. I have tried it on a test vcf, the remain question is that it cannot get/calculate the heterozygosis genotype counts when the allele number is over than 3


#4

you can compute this separately:

mt = mt.annotate_rows(het_count = hl.agg.count_where(mt.GT.is_het()))

If you want to filter to AN > 3:

mt = mt.filter_rows(mt.call_stats.AN > 3) # or what ever population

#5

Maybe I haven’t express clear. For example there are four alleles in a site, ‘A’ for ref, and [‘G’,‘C’,‘T’] for alternative bases, the homo-genotype counts can be cal by call_stats function, then how can I get the follow genotypes counts, AG, AC, AT …? Expect your reply, thanks


#6

annotate_rows(GC = hl.agg.counter(mt.GT)) should do it.