Use `hl.struct` for aggregate_entries on grouped matrix table?

Hi,

I’m wondering if it is possible to pass a hl.struct for a grouped matrix table like a regular MatrixTable. I would like to create a gene-summary table and include multiple items per entry. For example, total number of variants with alternate allele, flag if an individual has any alternate allele, number of missing variants, number of non-missing variants.

# This works
non_ref_by_transcripts = (
  mt
  .group_rows_by(mt.transcript_id)
  .aggregate_entries(
     n_non_ref = hl.agg.count_where(mt.GT.is_non_ref())
)

# Add flag if entry has any alternate allele
non_ref_by_transcripts = non_ref_by_transcripts.annotate_entries(
  has_non_ref = hl.if_else(non_ref_by_transcripts.n_non_ref >= 1, 1, 0)
)

# Count number of variants with missing genotype calls.
missing_by_transcripts = (
  mt
  .group_rows_by(mt.transcript_id)
  .aggregate_entries(
     n_missing_vars = hl.agg.count_where(hl.is_missing(mt.GT))
)
# This doesn't work
mt_transcripts = (
  mt
  .group_rows_by(mt.transcript_id)
  .aggregate_entries(
     hl.struct(
       n_non_ref = hl.agg.count_where(mt.GT.is_non_ref()),
       n_missing_vars = hl.agg.count_where(hl.is_missing(mt.GT)),
       n_non_missing_vars = hl.agg.count_where(hl.is_defined(mt.GT))
    )
)

Error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-b7ec0ba0b168> in <module>
      5                                  n_non_ref = hl.agg.count_where(mt.GT.is_non_ref()),
      6                                  n_missing=hl.agg.count_where(hl.is_missing(mt.GT)),
----> 7                                  n_non_missing=hl.agg.count_where(hl.is_defined(mt.GT)))
      8     )
      9     .result()

TypeError: aggregate_entries() takes 1 positional argument but 2 were given

Alternatively, is there is a way to annotate entries with an other MatrixTable? I.e. Create different MTs for each feature I want to calculate and merge them together.

Looks like the difference here is you aren’t using a keyword argument when you call the aggregate_entries that doesn’t work. Try like

.aggregate_entries(
    my_struct = hl.struct( .....)
)

Or just provide multiple entries:

mt_transcripts = (
  mt
  .group_rows_by(mt.transcript_id)
  .aggregate_entries(
       n_non_ref = hl.agg.count_where(mt.GT.is_non_ref()),
       n_missing_vars = hl.agg.count_where(hl.is_missing(mt.GT)),
       n_non_missing_vars = hl.agg.count_where(hl.is_defined(mt.GT))
)

Hi @johnc1231 and and @danking,

Both of those options seem to work. I think the second solution fits best with what I’m thinking about.

Thanks!