I’m trying to calculate ld for all variants within a dataset using ld_matrix() (see the script here). Within the dataset, there are multiallelic variants which have been split using split_multi_hts(), and I’m curious whether or not it’s appropriate to run ld_matrix on variants which have been split. e.g., is there any situation where two common, independent alleles might interfere with one another in the LD calculation?
My first thought would be that it would make more sense to only include biallelic variants, since the ld_matrix method is just computing the windowed pairwise correlation between variants.
Commonly used LD indices such as r2 handle LD of biallelic variants for two sites.
Though I’m not 100% sure here, and it may be worth trying to run ld_matrix on both just the biallelic variants, as well as the biallelic variants + split multiallelic variants and taking a look at the results.
Thanks @pwc2 , those are great resources and a very helpful approach. I’ll test running ld_matrix on biallelic variants only and then compare that to the results with multiallelic variants included.