Calculating AF on subset of samples

Hi,

I am trying to calculate AF on subset of samples but it is calculated all samples even though I successfully use mt.filter_cols().

# number of samples in mt
len(mt.s.collect())
7328

# number of samples in mt_case
mt_case = mt.filter_cols(hl.set(sampleids).contains(mt.s))
len(mt_case.s.collect())
32

mt_case.rows().to_pandas()

0	1:11854476	[T, G]	rs1801131	-10.0	None	False	[0.249401]	[10083, 4573]	[0.6879776200873362, 0.31202237991266374]	14656	[3496, 741]
...

What is wrong with my code?

Thanks.

I added variant_qc to mt:
mt = hl.variant_qc(mt) and then created subsets.

The solution is that after creating subset I have to add variant_qc again to the subset. mt_case = hl.variant_qc(mt_case) than mt_case.rows() will only be calculated on subset of samples.

1 Like