Hi
Hail newbie here. What would be the best to mean impute missing genotypes?
Thank you
Hi
Hail newbie here. What would be the best to mean impute missing genotypes?
Thank you
Good question, sorry we missed it.
So what I’d do is first get the mean of the each row. Assuming you have a matrix table mt
, it’s:
mt = mt.annotate_entries(n_alt_alleles=mt.GT.n_alt_alleles())
mt = mt.annotate_rows(mean_gt = hl.agg.mean(mt.n_alt_alleles))
Now you want to fill in any missing values of n_alt_alleles
with the mean you computed:
mt = mt.annotate_entries(n_alt_alleles=hl.coalesce(mt.n_alt_alleles, mt.mean_gt)
coalesce
is a function that returns its first non-missing argument.
This gives you the mean number of alternate alleles for each geneotype. Let me know if that’s what you were looking for.