I’m running maximal_independent_set
on a ht created using pc_relate
but am seeing inconsistent filtering results.
Here is the code:
def rank_related_samples(
relatedness_ht: hl.Table,
qc_ht: hl.Table
) -> Tuple[hl.Table, Callable[[hl.expr.Expression, hl.expr.Expression], hl.expr.NumericExpression]]:
def annotate_related_pairs(related_pairs: hl.Table, index_col: str) -> hl.Table:
related_pairs = related_pairs.key_by(**related_pairs[index_col])
return related_pairs.annotate(
**{
index_col: related_pairs[index_col].annotate(
dp_mean=hl.or_else(qc_ht[related_pairs.key].sample_qc.dp_stats.mean, -1.0)
)
}
).key_by()
relatedness_ht = annotate_related_pairs(relatedness_ht, "i")
relatedness_ht = annotate_related_pairs(relatedness_ht, "j")
def tie_breaker(l, r):
return (l.dp_mean - r.dp_mean)
return relatedness_ht, tie_breaker
and
data_source = 'broad'
freeze = 4
relatedness_ht = hl.read_table(relatedness_ht_path(data_source, freeze))
relatedness_ht = relatedness_ht.filter(relatedness_ht.kin > 0.08838835)
related_pairs_ht, related_pairs_tie_breaker = rank_related_samples(
relatedness_ht,
hl.read_table(qc_ht_path(data_source, freeze))
)
related_samples_to_drop_ht = hl.maximal_independent_set(related_pairs_ht.i, related_pairs_ht.j,
keep=False, tie_breaker=related_pairs_tie_breaker)
related_samples_to_drop_ht = related_samples_to_drop_ht.key_by()
related_samples_to_drop_ht = related_samples_to_drop_ht.select(**related_samples_to_drop_ht.node)
related_samples_to_drop_ht = related_samples_to_drop_ht.key_by('s')
I noticed that running this code sometimes selects a sample from a pair that has the higher mean depth but sometimes selects a sample from a pair with the lower mean depth. I checked using two separate pairs, and the samples in those lines are unique to those pairs. I also get different filtering results when I run the code on the full relatedness table, when I run the code on the table filtered to each unique pair, and when I run the code filtered to the two example pairs. Happy to share an HTML of my notebook that is clearer than my text description.
How should the tiebreaker be behaving, and why does it return different results?