Picking Data Of Only 1 Participant

Haseeb1 · October 12, 2019, 9:17am

1- I have whole exome sequencing data on 100 participants. How to pick the data of only 1 participant.
2- How could i get the data of 25 or 50 participants.

tpoterba · October 12, 2019, 12:29pm

use filter_cols(mt.s == 'ID_YOU_WANT')

If you want to select many, put them in a Python list and do:

filter_cols(hl.literal(ids_to_keep).contains(mt.s))

Haseeb1 · October 14, 2019, 4:15am

It is Filtering my Participants id now i want to get the data of only those filtered paarticipants.

Haseeb1 · October 14, 2019, 6:41am

Or how would i filter my rows by non_reference_samples? I can filter it by using mt.contains(‘1 id at a time’). I want to put a list in mt.contains() but it is giving me error;
“”“Error summary: HailException: no conversion found for contains(array, array)bool”""

Haseeb1 · October 14, 2019, 11:03am

I am using this code;

data1 = ds_result1.filter_rows(ds_result1.non_reference_samples.contains(“Participant_ID”))

to get all rows having this participant id. But i want to have all rows having Participant_ID1, Participant_ID2, Participant_ID3 and so on, the Participant_IDs will be in a python list.

tpoterba · October 14, 2019, 11:10am

This one should work:

mt = mt.filter_cols(hl.literal(participant_ids).contains(mt.s))

Haseeb1 · October 14, 2019, 11:17am

It is working but it is not filtering variants(rows). Every variant have a non_reference_samples(its a column). I want to filter rows by there non_reference_samples.

Haseeb1 · October 14, 2019, 11:21am

mt = mt.filter_cols(hl.literal(participant_ids).contains(mt.s))

While executing this code, my count is reducing from 100(participants) to 1 but my variants count is still unchanged.

tpoterba · October 14, 2019, 1:49pm

in order to filter rows, you’ll need to use filter_rows.

One thing you might want to do after filtering to just one sample is the following:

ht = mt.localize_entries('entries')
ht = ht.transmute(**ht.entries[0])

After this, you’ll have a table with a single GT, AD, etc, and you can filter on that.

Note that I don’t recommend doing this inside a loop in Python that does something per sample - that will be very inefficient

Topic		Replies	Views
Cant pick concerned recipient_id from mt Hail Query & hailctl	14	492	October 7, 2019
Filter samples from MatrixTable Hail Query & hailctl	8	647	October 22, 2021
More efficient way to extract calls? Hail Query & hailctl	2	369	December 14, 2022
Filter for specific variants Hail Query & hailctl	1	566	July 23, 2023
Filter_rows by list of genes Hail Query & hailctl	6	805	February 17, 2023

Picking Data Of Only 1 Participant

Related topics