Plot.scatter not showing all samples when visualizing PCA

Hi,

I have annotated 32 samples (among thousands) and conduct PCA on the db. But only 11 show on the scatter plot. Could this be bug or am I missing something?

Here is the code:

len(mt.filter_cols(mt.group == "case").s.collect())
32 # the rest have no annotation in mt.group

eigenvalues, pcs _ = hl.hwe_normalized_pca(mt.GT)
mt = mt.annotate_cols(scores = pcs[mt.s].scores)

p = hl.plot.scatter(mt.scores[0],
                    mt.scores[1],
                    hover_fields=dict([('name', mt.s)]),
                    label=mt.group,
                    title='PCA', xlabel='PC1', ylabel='PC2')
show(p)

When I turn off the NA group I can only count 11 “case” samples on the plot. There should be 32. (If I do mt.filter_cols and create new mt with only annotated samples and then do PCA, I will get all the 32 on the plot.)

Thanks

Hail’s scatter plot, by default, collapses points that are directly on top one another, since they wouldn’t render as separate points. I think the tooltip should indicate that there are two samples. Is that the case?

Hmm. Also possible that we don’t handle missing-value labels correctly. Can you try converging the missing value labels to a string? e.g.: label=hl.coalesce(mt.group, "MISSING").