Hi!
Here I’m trying to perform gene-based GWAS, but after obtaining aggregated information entries fields was excluded, and I guess there should be a less complicated approach to do it.
mt.annotate_rows(\
non_miss = hl.agg.group_by(mt.pheno.cases_controls, hl.agg.count_where(hl.is_defined(mt.GT))),
hom_ref = hl.agg.group_by(mt.pheno.cases_controls, hl.agg.count_where(mt.GT.is_hom_ref())),
hom_var = hl.agg.group_by(mt.pheno.cases_controls, hl.agg.count_where(mt.GT.is_hom_var())),
het = hl.agg.group_by(mt.pheno.cases_controls, hl.agg.count_where(mt.GT.is_het()))
)
mt_gene_based = mt.group_rows_by(gene = mt.kept_transcripts[3])\ # gene field from CSQ field
.aggregate_rows(non_miss_gene_cases = hl.agg.sum(mt.non_miss['case']),
non_miss_gene_controls = hl.agg.sum(mt.non_miss['control']),
hom_ref_gene_cases = hl.agg.sum(mt.hom_ref['case']),
hom_ref_gene_controls = hl.agg.sum(mt.hom_ref['control']),
hom_var_gene_cases = hl.agg.sum(mt.hom_var['case']),
hom_var_gene_controls = hl.agg.sum(mt.hom_var['control']),
het_gene_cases = hl.agg.sum(mt.het['case']),
het_gene_controls = hl.agg.sum(mt.het['control']),
).result()
p.s. there may be a more compact way to do the same.