Ld_matrix for specific variant/locus and radius



I was wondering if it would be possible to get a function like ld_matrix that would return the LD between a single variant/locus and neighbouring variants within a specified radius, that would be very helpful.




You only care about the results for a single variant? If the radius is small, load the dataset, hl.filter_intervals to the relevant window, collect the data locally, then use numpy to compute X dot X^T.

Something like (I haven’t tested this, you probably need to mean center and variance normalize on line 4):

mt = hl.read_matrix_table(...)
mt = hl.filter_intervals(mt, ... interval around variant of interest ...)
mt = mt.select_entries(n_alts = mt.gt.n_alt_alleles()).select_cols().select_rows()
mt = mt.select_entries(n_alts = ... normalize alt allele count appropriately ...)
t = mt._localize_entries('entries', 'cols')
list_of_rows = t.select(t.entries.map(lambda entry: entry.n_alts)).collect()
x = numpy.array(list_of_rows)
ld_matrix = x * x.T


Hi Dan,

apologies for the late answer, and thank you for your help (though I’m not sure I understand all these steps).

Well, I’m interested in the results for lists of variants, in order to produce regional plots for each of the variants (BTW, that would be a great function to add to Hail :grin: )


@jbloom Can provide a more detailed answer, but if you want to calculate the diagonal of the LD matrix without calculating the whole thing, you can use something like this:

ld = hl.ld_matrix(mt.GT.n_alt_alleles(), mt.locus, BASE_PAIR_RADIUS)

@jbloom, is there a simple way to take this and produce the list of sub-matrices defined by a list of loci and base pair radii?