Clumping variants to calculate polygenic risk scores


#1

It would be helpful to calculate polygenic risk scores if a clump variants function existed. This is a greedy algorithm described in plink here: https://www.cog-genomics.org/plink/1.9/postproc#clump

Basically: 1) compute r^2 (LD) between pairs of variants, 2) rank order summary stats by p-values from most to least significant, 3) descend down variant list storing variants along the way: if no stored variant is in LD with current variant, store it. Otherwise chuck out any variants in LD with a more significant stored variant. This function comes in handy for some other tasks as well. Alternatively (even better but probably much harder): LDPred