Running ld_matrix on multiallelic variants


I’m trying to calculate ld for all variants within a dataset using ld_matrix() (see the script here). Within the dataset, there are multiallelic variants which have been split using split_multi_hts(), and I’m curious whether or not it’s appropriate to run ld_matrix on variants which have been split. e.g., is there any situation where two common, independent alleles might interfere with one another in the LD calculation?

Thanks in advance!

Hey @KatalinaBobowik!

My first thought would be that it would make more sense to only include biallelic variants, since the ld_matrix method is just computing the windowed pairwise correlation between variants.

I was just looking around a bit, and on the PLINK 2.0 linkage disequilibrium page it mentions:

Since two-variant r2 only makes sense for biallelic variants, these collapse multiallelic variants down to most common allele vs. the rest.

And this paper eLD: entropy-based linkage disequilibrium index between multiallelic sites makes it look like it is a bit more involved to include multiallelic sites in LD calculations, also stating in the abstract:

Commonly used LD indices such as r2 handle LD of biallelic variants for two sites.

Though I’m not 100% sure here, and it may be worth trying to run ld_matrix on both just the biallelic variants, as well as the biallelic variants + split multiallelic variants and taking a look at the results.

1 Like

Thanks @pwc2 , those are great resources and a very helpful approach. I’ll test running ld_matrix on biallelic variants only and then compare that to the results with multiallelic variants included.

Thanks again!