Aggregating a numeric array

Hello all. I am trying to process the Hail table from gnomAD v4. In the hail table, there is a row field called freq with this schema:

Row fields:
    'locus': locus<GRCh38> 
    'alleles': array<str> 
    'freq': array<struct {
        AC: int32, 
        AF: float64, 
        AN: int32, 
        homozygote_count: int64
    }> 

I want to annotate a new row that aggregates the entire array of ht.freq.AC per row, I tried it with this method but it didn’t work:

ht = ht.annotate(freq_AC_sum=hl.agg.array_sum(ht.freq.AC))
ExpressionException: 'Table.annotate: field 'freq_AC_sum'' does not support aggregation

May I please get some advice on how I can do that? Thank you very much.

Hi Do you find any solution? thanks

Apologies for the delayed response to both of you!

You can aggregate over an array expression a using a.aggregate. So the above example would be

ht = ht.annotate(freq_AC_sum=ht.freq.aggregate(lambda freq: hl.agg.sum(freq.AC))

In the special case of computing the sum of an array, you can also use hl.sum. So if ht.AC were a top-level array field, you could do

ht = ht.annotate(freq_AC_sum=hl.sum(ht.AC))

I know the naming isn’t super clear, but the hl.agg.array_sum aggregator is for aggregating many array fields of the same length, producing an array of the elementwise aggregates.