Independent SNPS in PCA

Hi Hail team!

I have a large mt, and I want to select independent SNPS to be used in a PCA. Is the selection of specific SNPS possible with Hail PCA?

hl.pca doesn’t do this internally, but it’s certainly possible (and recommended) to filter to independent snps first.

Try:

mt = hl.variant_qc(mt) # compute allele frequencies
mt = mt.filter_rows(hl.min(mt.variant_qc.AF) > 0.02) # or whatever threshold
mt = hl.ld_prune(mt)
eig, scores, _ = hl.hwe_normalized_pca(mt.GT)