I am trying to reproduce Hail’s pc_project output using plink –score. I am adding the variance-standardize option and keeping mean imputation. Plink notes that “PCs will be scaled a bit differently from ref_data.eigenvec; you need to multiply or divide the PCs by a multiple of sqrt(eigenvalue) to put them on the same scale”, how does Hail scale the PCs? is there a specific scaling equation that I can use to make my plink output similar to Hail’s?
I found the code published by Zhou et al. (doi: 10.1016/j.xgen.2022.100192) very helpful.
I was able to project my dataset using PLINK --sscore on loadings generated by Hail, using the following options:
Then I scaled the projection output sscore by dividing it by the square root of sscore.vars. I was wondering whether this scaling step is performed internally in Hail’s pc_project.
Detailed documentation on how to replicate Hail’s outputs using PLINK would be helpful for the future, as many users cannot install Hail on their local servers.