Hi all, let’s say I have a MatrixTable of 500 males and 500 females (annotated with a column field - “gender”). If I want a final mt with 500 males and 250 females (random sample), am I able to perform this operation natively within the MatrixTable (i.e. random sample 250 out of 500 females only based on mt.gender
, but leaving the male samples all intact)? Thanks !
Here’s an easy way to do this:
samples_to_keep = mt.aggregate_cols(hl.agg.filter(mt.gender == 'M', hl.agg.collect(mt.s)).extend(hl.agg.filter(mt.gender == 'F', hl.agg.take(mt.s, 250, ordering=hl.rand_unif()))
mt.filter_cols(hl.literal(samples_to_keep).contains(mt.s))
I tried:
samples_to_keep = mt.aggregate_cols(hl.agg.filter(mt.gender == 'M', hl.agg.collect(mt.s)).extend(hl.agg.filter(mt.gender == 'F', hl.agg.take(mt.s, 250, ordering=hl.rand_unif()))
and I got:
SyntaxError: unexpected EOF while parsing
Can I check which part causes the error? Thank you very much.
I realized there were not enough right brackets for the line, and after correcting that:
samples_to_keep = mt.aggregate_cols(
hl.agg.filter(
mt.gender == 'M', hl.agg.collect(mt.s)
).extend(
hl.agg.filter(mt.gender == 'F', hl.agg.take(mt.s, 105, ordering=hl.rand_unif()))
)
)
I got this error:
TypeError: missing a required argument: 'lower'
hl.rand_unif
requires a lower and upper argument. It doesn’t default to 0, 1. I’ll make a PR to fix that. In the meantime you’ll need to change to hl.rand_unif(0, 1)
.