Count all members in an array field with counter aggregator for whole table

I have an entry field in a matrix table with two string values and I want to count them with the hl.agg.acounter but cannot figure out how to do this for each entry in the array.

ht = hl.utils.range_table(10)
ht = ht.annotate(arr = hl.range(0, 2).map(lambda _: hl.if_else(hl.rand_bool(0.5),'REF','PTV')))
ht.show()
idx.           arr
int32	array<str>
0	["REF","PTV"]
1	["PTV","REF"]
2	["REF","REF"]
3	["PTV","PTV"]
4	["PTV","PTV"]
5	["REF","REF"]
6	["PTV","REF"]
7	["REF","REF"]
8	["REF","REF"]
9	["PTV","PTV"]

I would like to be able to count each occurance of ‘REF’ and ‘PTV’ with hl.agg.counter or something so I could get this:

{'PTV': 9, 'REF': 11}

I get close with something like this:

ht.aggregate(hl.agg.array_agg(lambda x: hl.agg.counter(x),ht.arr))
[{'PTV': 5, 'REF': 5}, {'PTV': 4, 'REF': 6}]

But then I need to run through that list and add each keys value etc…

Or this, but then I need to know the actual field values, which I don’t

ht.aggregate(
    hl.dict([
        ('REF', hl.agg.sum(ht.arr.filter(lambda x: x == 'REF').size())),
        ('PTV', hl.agg.sum(ht.arr.filter(lambda x: x == 'PTV').size()))
    ])
)
{'PTV': 9, 'REF': 11}

Is there an easier way to do this?

This is exactly what hl.agg.explode is for!

ht.aggregate(hl.agg.explode(lambda element: hl.agg.counter(element), ht.arr))
1 Like

I had experimented with Explode but did not get anywhere. This works great!

The “control flow” aggregators are filter (aggregate records conditionally, according to some predicate), explode (apply an aggregation to each element of a collection), group_by (apply an aggregation to each record, grouped according to some other value, and array_agg (aggregate elements of a fixed-size array by position, returning an array of results for each index).

These can be nested arbitrarily:

hl.agg.filter(PREDICATE, 
              hl.agg.group_by(KEY,
                              hl.agg.explode(lambda x: INNER_AGG, ARRAY))

is totally valid.