I am trying to get the maximum independent set of phenotypes based on phenotype correlations I have generated from the UKB. I have already made the pairs file as described in the documentation (Hail | Miscellaneous) and now am wanting to prune them to the max indep sets. I am trying several cor thresholds to see how many phenotypes would remain at different ones, but strangely am getting the same number in the max set at the end despite differing lengths of pairs lists to prune. I am directly following the example in the docs so want to make sure something isn’t going wrong. Advice would be appreciated!
Here’s the relevant code:
ht = hl.read_table(‘Pheno_pairwise_correlations.ht’)
pairs10 = ht.filter(hl.float32(ht.cor) > 0.1) #96264 long for cor at 10% limit
pairs20 = ht.filter(hl.float32(ht.cor) > 0.2) #283680 long for cor at 20% limit
related_to_remove10 = hl.maximal_independent_set(pairs10.i, pairs10.j, False) #14165 long
related_to_remove20 = hl.maximal_independent_set(pairs20.i, pairs20.j, False) #14165 long also
Thanks so much,