Group_rows_by number of rows pr. group

When using group_rows_by, is there a way to get the group annotated with how many rows went into that group?

Ex. when grouping by gene and aggregating the entries.

dataset_result = dataset.group_rows_by(dataset.gene).aggregate(
    n_non_ref = hl.agg.count_where(dataset.GT.is_non_ref())
)

I would also like the dataset_result to have a row-value which is the number of SNPs grouped into that gene.

I expect the same question applies to the group_cols_by.

yes, you can do:

dataset_result = dataset.group_rows_by(dataset.gene) \
    .aggregate_rows(n = hl.agg.count()) \
    .aggregate(n_non_ref = hl.agg.count_where(dataset.GT.is_non_ref()))
1 Like