Calculating AF on subset of samples

igorm · February 24, 2023, 11:30am

Hi,

I am trying to calculate AF on subset of samples but it is calculated all samples even though I successfully use mt.filter_cols().

# number of samples in mt
len(mt.s.collect())
7328

# number of samples in mt_case
mt_case = mt.filter_cols(hl.set(sampleids).contains(mt.s))
len(mt_case.s.collect())
32

mt_case.rows().to_pandas()

0	1:11854476	[T, G]	rs1801131	-10.0	None	False	[0.249401]	[10083, 4573]	[0.6879776200873362, 0.31202237991266374]	14656	[3496, 741]
...

What is wrong with my code?

Thanks.

igorm · February 24, 2023, 2:03pm

I added variant_qc to mt:
mt = hl.variant_qc(mt) and then created subsets.

The solution is that after creating subset I have to add variant_qc again to the subset. mt_case = hl.variant_qc(mt_case) than mt_case.rows() will only be calculated on subset of samples.

Topic		Replies	Views
Hl.variant_qc and the FT field Hail Query & hailctl	4	456	August 16, 2023
Mt.key_cols_by().cols().flatten().to_pandas() is too slow Hail Query & hailctl	5	342	August 7, 2023
Variant qc AF field index meaning Hail Query & hailctl	5	1000	July 27, 2023
Issues with sample and variant QC by group Hail Query & hailctl	9	1160	May 14, 2020
Counting number of variants for each interval Hail Query & hailctl	3	393	September 22, 2022

Calculating AF on subset of samples

Related topics