Plot.scatter not showing all samples when visualizing PCA

igorm · February 28, 2023, 3:23pm

Hi,

I have annotated 32 samples (among thousands) and conduct PCA on the db. But only 11 show on the scatter plot. Could this be bug or am I missing something?

Here is the code:

len(mt.filter_cols(mt.group == "case").s.collect())
32 # the rest have no annotation in mt.group

eigenvalues, pcs _ = hl.hwe_normalized_pca(mt.GT)
mt = mt.annotate_cols(scores = pcs[mt.s].scores)

p = hl.plot.scatter(mt.scores[0],
                    mt.scores[1],
                    hover_fields=dict([('name', mt.s)]),
                    label=mt.group,
                    title='PCA', xlabel='PC1', ylabel='PC2')
show(p)

When I turn off the NA group I can only count 11 “case” samples on the plot. There should be 32. (If I do mt.filter_cols and create new mt with only annotated samples and then do PCA, I will get all the 32 on the plot.)

Thanks

danking · February 28, 2023, 8:26pm

Hail’s scatter plot, by default, collapses points that are directly on top one another, since they wouldn’t render as separate points. I think the tooltip should indicate that there are two samples. Is that the case?

Hmm. Also possible that we don’t handle missing-value labels correctly. Can you try converging the missing value labels to a string? e.g.: label=hl.coalesce(mt.group, "MISSING").

Topic		Replies	Views
PCA filtering samples? Hail Query & hailctl	2	402	April 14, 2020
Bokeh plotting issues Hail Query & hailctl	3	330	February 14, 2022
Question regarding threshold for hail imputation/"No complete samples" error Hail Query & hailctl	5	441	October 19, 2022
Select certain samples from MatrixTable Hail Query & hailctl	9	820	October 6, 2022
Inconsistent sample qc results Hail Query & hailctl	4	420	April 22, 2020

Plot.scatter not showing all samples when visualizing PCA

Related topics