Maximal independent set behaving inconsistently

I’m running maximal_independent_set on a ht created using pc_relate but am seeing inconsistent filtering results.

Here is the code:

def rank_related_samples(
        relatedness_ht: hl.Table,
        qc_ht: hl.Table
) -> Tuple[hl.Table, Callable[[hl.expr.Expression, hl.expr.Expression], hl.expr.NumericExpression]]:

    def annotate_related_pairs(related_pairs: hl.Table, index_col: str) -> hl.Table:
        related_pairs = related_pairs.key_by(**related_pairs[index_col])
        return related_pairs.annotate(
            **{
                index_col: related_pairs[index_col].annotate(
                    dp_mean=hl.or_else(qc_ht[related_pairs.key].sample_qc.dp_stats.mean, -1.0)
                )
            }
        ).key_by()

    relatedness_ht = annotate_related_pairs(relatedness_ht, "i")
    relatedness_ht = annotate_related_pairs(relatedness_ht, "j")

    def tie_breaker(l, r):
        return (l.dp_mean - r.dp_mean)

    return relatedness_ht, tie_breaker

and

data_source = 'broad'
freeze = 4
relatedness_ht = hl.read_table(relatedness_ht_path(data_source, freeze))
relatedness_ht = relatedness_ht.filter(relatedness_ht.kin > 0.08838835)

related_pairs_ht, related_pairs_tie_breaker = rank_related_samples(
            relatedness_ht,
            hl.read_table(qc_ht_path(data_source, freeze))
)

related_samples_to_drop_ht = hl.maximal_independent_set(related_pairs_ht.i, related_pairs_ht.j,
                                                        keep=False, tie_breaker=related_pairs_tie_breaker)
related_samples_to_drop_ht = related_samples_to_drop_ht.key_by()
related_samples_to_drop_ht = related_samples_to_drop_ht.select(**related_samples_to_drop_ht.node)
related_samples_to_drop_ht = related_samples_to_drop_ht.key_by('s')

I noticed that running this code sometimes selects a sample from a pair that has the higher mean depth but sometimes selects a sample from a pair with the lower mean depth. I checked using two separate pairs, and the samples in those lines are unique to those pairs. I also get different filtering results when I run the code on the full relatedness table, when I run the code on the table filtered to each unique pair, and when I run the code filtered to the two example pairs. Happy to share an HTML of my notebook that is clearer than my text description.

How should the tiebreaker be behaving, and why does it return different results?

It’s possible there is a bug here. Is there a subset of related_pairs_ht that exhibits the behavior? If you could send us a failing example table, I’m sure we can figure out what the issue is.

Yes! I will slack it to you. Thank you!

Sorry for posting again, but this issue is currently a blocker

Thanks for the bug report @ch-kr the fix is PR’ed https://github.com/hail-is/hail/pull/7729

1 Like