Change distance metric in PCA


I need to do a PCA using the Jaccard distance metric instead of euclidean distance. Is it possible to change the distance metric of pca() or hwe_normalized_pca() in any way?



You want to do something like the algorithm proposed in this paper:

This unfortunately requires a totally different algorithm, which isn’t really PCA, so it’s not something we can tweak easily. Perhaps @jbloom could comment further.


Yes, exactly that paper


@jbloom what are your thoughts on this?


@patrick-schultz and I were actually investigating Jaacaard similarity (and other generalizations of min hashing) around the same time as that paper, and he worked out some cool theoretical extensions that were challenging to implement in Hail using the infrastructure at the time. Patrick, any thoughts on where implementing this at the level of Python might fit into the current development arc?


That would be awesome!