1- I have whole exome sequencing data on 100 participants. How to pick the data of only 1 participant.
2- How could i get the data of 25 or 50 participants.
use filter_cols(mt.s == 'ID_YOU_WANT')
If you want to select many, put them in a Python list and do:
filter_cols(hl.literal(ids_to_keep).contains(mt.s))
It is Filtering my Participants id now i want to get the data of only those filtered paarticipants.
Or how would i filter my rows by non_reference_samples? I can filter it by using mt.contains(‘1 id at a time’). I want to put a list in mt.contains() but it is giving me error;
“”“Error summary: HailException: no conversion found for contains(array, array)bool”""
I am using this code;
data1 = ds_result1.filter_rows(ds_result1.non_reference_samples.contains(“Participant_ID”))
to get all rows having this participant id. But i want to have all rows having Participant_ID1, Participant_ID2, Participant_ID3 and so on, the Participant_IDs will be in a python list.
This one should work:
mt = mt.filter_cols(hl.literal(participant_ids).contains(mt.s))
It is working but it is not filtering variants(rows). Every variant have a non_reference_samples(its a column). I want to filter rows by there non_reference_samples.
mt = mt.filter_cols(hl.literal(participant_ids).contains(mt.s))
While executing this code, my count is reducing from 100(participants) to 1 but my variants count is still unchanged.
in order to filter rows, you’ll need to use filter_rows
.
One thing you might want to do after filtering to just one sample is the following:
ht = mt.localize_entries('entries')
ht = ht.transmute(**ht.entries[0])
After this, you’ll have a table with a single GT
, AD
, etc, and you can filter on that.
Note that I don’t recommend doing this inside a loop in Python that does something per sample - that will be very inefficient