I have filtered some of my data by HC/LC, now i want to get respective recipient_id of the rows. But i cant figure it out.
Can you give an example of your data and what you want to get out?
Basically the participant_id is my global field, which is the “S” when i do mt.describe, i can acces them with mt.s.show(). But i just want the ids that responds to my filtered rows.
The data is whole exome sequencing data of 100 participants.
But i just want the ids that responds to my filtered rows.
I still don’t understand this part, sorry.
What would the right answer look like?
- When i do mt.s.show() it shows a column of recipients ids.
- When i do mt.rows().show() i get rows having locus, alleles, hc/lc lof etc.
Now i want to know the participant_id of specific locus.
participant_id of specific locus
There is an entry record for every sample (participant ID) at each locus.
Both methods are giving me locus, alleles and that data but not their participant_ids.
Just to let you know, i have applied vep and variant_qc on my data and then filter only lof hc out, and wanted ids of it.
Can you define what “the recipient ids of a locus” means? Does this mean the participant IDs of samples that have a non-reference genotype at that locus?
Ahhh. I was quite confused by that terminology.
In general you won’t want to do something like this in a pipeline operating on the full data, but you can do the following:
mt = mt.annotate_rows( non_reference_samples = hl.agg.filter(mt.GT.is_non_ref(), hl.agg.collect(mt.s))) mt.non_reference_samples.show()
Thanks ill try it and get back to you. Sorry for that terminology. Peace.
I beleive we are looking for the following.
We have implemented VEP and we are looking to filter which participants carry the LoF mutations. Hence we would like to filter based on LoF HC flag and filter out genes, locus position and the participants carryings those LoF mutations.
Thank you. This provides all non_reference carriers; how about just the rare genotypes?
What is your definition of a rare genotype? If you only want to look at rows (i.e. variants) with an alternate allele frequency less than 0.001, you might try adding this before the commands Tim suggested above:
mt = hl.variant_qc(mt) mt = mt.filter_rows(mt.variant_qc.AF < 0.001)