Maximal_independent_set

eatkinson · January 20, 2021, 11:56pm

Hi!

I am trying to get the maximum independent set of phenotypes based on phenotype correlations I have generated from the UKB. I have already made the pairs file as described in the documentation (Hail | Miscellaneous) and now am wanting to prune them to the max indep sets. I am trying several cor thresholds to see how many phenotypes would remain at different ones, but strangely am getting the same number in the max set at the end despite differing lengths of pairs lists to prune. I am directly following the example in the docs so want to make sure something isn’t going wrong. Advice would be appreciated!

Here’s the relevant code:

ht = hl.read_table(‘Pheno_pairwise_correlations.ht’)
pairs10 = ht.filter(hl.float32(ht.cor) > 0.1) #96264 long for cor at 10% limit
pairs20 = ht.filter(hl.float32(ht.cor) > 0.2) #283680 long for cor at 20% limit
related_to_remove10 = hl.maximal_independent_set(pairs10.i, pairs10.j, False) #14165 long
related_to_remove20 = hl.maximal_independent_set(pairs20.i, pairs20.j, False) #14165 long also

Thanks so much,
Elizabeth

danking · January 21, 2021, 2:46am

Hey @eatkinson !

It looks like you’re asking for the nodes that should be removed, i.e. the complement of the set of nodes to keep. Try changing the third argument of MIS to True. I like to use the keyword argument syntax so this sticks out a bit:

hl.maximal_independent_set(..., keep=True)

eatkinson · January 21, 2021, 5:23pm

Hi Dan!

Thanks for getting back to me. Yes, I am indeed trying to get a list of the ones to be removed that I will then exclude from the full dataset. This is how the process was presented in the docs, which have the following as the example:

Starting from the above pairs, prune individuals from a dataset until no close relationships remain:

related_samples_to_remove = hl.maximal_independent_set(pairs.i, pairs.j, False)
result = dataset.filter_cols(
… hl.is_defined(related_samples_to_remove[dataset.col_key]), keep=False)

As an aside, if I do change the 3rd argument to True, I end up with a list 0 long.

Elizabeth

danking · January 21, 2021, 6:27pm

Ah, OK, I understand. The first thing I would check is: are there any self-correlations?

ht.filter(ht.i == ht.j).show()

If a node has a “self-edge” (an edge to itself) then it cannot be in the independent set (it’s always connected to another node, namely itself). You can remove these with:

ht = ht.filter(ht.i != ht.j)

eatkinson · January 21, 2021, 7:07pm

oh makes sense! I just tried removing self-correlations (which indeed were there) and it results in different lengths now. Thanks so much!

Topic		Replies	Views
Maximal independent set behaving inconsistently Hail Query & hailctl	4	477	December 13, 2019
Best way to check relatedness in large sample sets Help [0.1]	4	1384	October 9, 2018
LD pruning and IBD Hail Query & hailctl	9	1235	November 10, 2023
Maximal_independent_set() guide correction Hail Query & hailctl	1	521	December 11, 2018
Hl.maximal_independent_set - job 'cancelled because SparkContext was shut down' Hail Query & hailctl	28	7604	February 4, 2021

Maximal_independent_set

Related topics