Compound heterozygote analysis?

Greetings, I am trying to do a compound heterozygotes analysis for a reasonably large dataset (~600k variants, 1k samples). From an earlier post on the forum, I am trying to use this code:
mt =mt.annotate_cols(hets=hl.agg.group_by(mt.gene_symbol,hl.agg.filter(mt.GT.is_het(),hl.agg.collect(mt.hgvs))))
However, I am unable to even compute the hets field, presumably because I have a large no. of genes and samples. Is there a way to just keep genes with more than one heterozygous calls, or are there any built-in function within Hail for this purpose?

I think you’ll have a better time in a two step process:

mt = mt.group_rows_by(mt.gene_symbol).aggregate(
    compound_hets = hl.agg.filter(mt.GT.is_het(), hl.agg.collect(mt.hgvs))
)

Note: this uses group_rows_by which is a shuffling operation. That means you’ll want to use non-preemptible / non-spot VMs if you’re running on a cloud cluster. More details on shuffling here.

Thank you very much danking. As this annotate entries, is there any way I can annotate the cols with genes that have >1 heterozygous calls? I am still new to this, appreciate any help !

Thank you Danking! Is there a way to aggregate by column (samples) for genes that have >1 heterozygous calls?