Mean impute missing genotypes

Hi
Hail newbie here. What would be the best to mean impute missing genotypes?

Thank you

Good question, sorry we missed it.

So what I’d do is first get the mean of the each row. Assuming you have a matrix table mt, it’s:

mt = mt.annotate_entries(n_alt_alleles=mt.GT.n_alt_alleles())
mt = mt.annotate_rows(mean_gt = hl.agg.mean(mt.n_alt_alleles))

Now you want to fill in any missing values of n_alt_alleles with the mean you computed:

mt = mt.annotate_entries(n_alt_alleles=hl.coalesce(mt.n_alt_alleles, mt.mean_gt)

coalesce is a function that returns its first non-missing argument.

This gives you the mean number of alternate alleles for each geneotype. Let me know if that’s what you were looking for.