Picking Data Of Only 1 Participant

1- I have whole exome sequencing data on 100 participants. How to pick the data of only 1 participant.
2- How could i get the data of 25 or 50 participants.

use filter_cols(mt.s == 'ID_YOU_WANT')

If you want to select many, put them in a Python list and do:


It is Filtering my Participants id now i want to get the data of only those filtered paarticipants.

Or how would i filter my rows by non_reference_samples? I can filter it by using mt.contains(‘1 id at a time’). I want to put a list in mt.contains() but it is giving me error;
“”“Error summary: HailException: no conversion found for contains(array, array)bool”""

I am using this code;

data1 = ds_result1.filter_rows(ds_result1.non_reference_samples.contains(“Participant_ID”))

to get all rows having this participant id. But i want to have all rows having Participant_ID1, Participant_ID2, Participant_ID3 and so on, the Participant_IDs will be in a python list.

This one should work:

mt = mt.filter_cols(hl.literal(participant_ids).contains(mt.s))

It is working but it is not filtering variants(rows). Every variant have a non_reference_samples(its a column). I want to filter rows by there non_reference_samples.

mt = mt.filter_cols(hl.literal(participant_ids).contains(mt.s))

While executing this code, my count is reducing from 100(participants) to 1 but my variants count is still unchanged.

in order to filter rows, you’ll need to use filter_rows.

One thing you might want to do after filtering to just one sample is the following:

ht = mt.localize_entries('entries')
ht = ht.transmute(**ht.entries[0])

After this, you’ll have a table with a single GT, AD, etc, and you can filter on that.

Note that I don’t recommend doing this inside a loop in Python that does something per sample - that will be very inefficient