Cant pick concerned recipient_id from mt

Haseeb1 · October 4, 2019, 1:54pm

Hello!
I have filtered some of my data by HC/LC, now i want to get respective recipient_id of the rows. But i cant figure it out.

tpoterba · October 4, 2019, 5:42pm

Can you give an example of your data and what you want to get out?

Haseeb1 · October 4, 2019, 6:32pm

Basically the participant_id is my global field, which is the “S” when i do mt.describe, i can acces them with mt.s.show(). But i just want the ids that responds to my filtered rows.
The data is whole exome sequencing data of 100 participants.

tpoterba · October 4, 2019, 6:33pm

But i just want the ids that responds to my filtered rows.

I still don’t understand this part, sorry.

What would the right answer look like?

Haseeb1 · October 4, 2019, 6:38pm

When i do mt.s.show() it shows a column of recipients ids.
When i do mt.rows().show() i get rows having locus, alleles, hc/lc lof etc.
Now i want to know the participant_id of specific locus.

tpoterba · October 4, 2019, 6:39pm

participant_id of specific locus

There is an entry record for every sample (participant ID) at each locus.

How about mt.show() or mt.entries().show()?

Haseeb1 · October 4, 2019, 6:41pm

Both methods are giving me locus, alleles and that data but not their participant_ids.

Haseeb1 · October 4, 2019, 6:43pm

Just to let you know, i have applied vep and variant_qc on my data and then filter only lof hc out, and wanted ids of it.

tpoterba · October 4, 2019, 6:45pm

Can you define what “the recipient ids of a locus” means? Does this mean the participant IDs of samples that have a non-reference genotype at that locus?

Haseeb1 · October 4, 2019, 6:48pm

Yes.

tpoterba · October 4, 2019, 6:51pm

Ahhh. I was quite confused by that terminology.

In general you won’t want to do something like this in a pipeline operating on the full data, but you can do the following:

mt = mt.annotate_rows(
     non_reference_samples = hl.agg.filter(mt.GT.is_non_ref(),
                                           hl.agg.collect(mt.s)))
mt.non_reference_samples.show()

Haseeb1 · October 4, 2019, 6:53pm

Thanks ill try it and get back to you. Sorry for that terminology. Peace.

Danish436 · October 4, 2019, 7:52pm

I beleive we are looking for the following.

We have implemented VEP and we are looking to filter which participants carry the LoF mutations. Hence we would like to filter based on LoF HC flag and filter out genes, locus position and the participants carryings those LoF mutations.

Danish436 · October 5, 2019, 2:56am

Thank you. This provides all non_reference carriers; how about just the rare genotypes?

danking · October 7, 2019, 12:46am

What is your definition of a rare genotype? If you only want to look at rows (i.e. variants) with an alternate allele frequency less than 0.001, you might try adding this before the commands Tim suggested above:

mt = hl.variant_qc(mt)
mt = mt.filter_rows(mt.variant_qc.AF[1] < 0.001)

Topic		Replies	Views
Picking Data Of Only 1 Participant Hail Query & hailctl	8	513	October 14, 2019
Outputting sample IDs for multi allelic sites Hail Query & hailctl	5	188	April 10, 2024
Adding sample IDs of those with non-ref GT as row variable Hail Query & hailctl	2	325	March 17, 2022
Filter_rows by list of genes Hail Query & hailctl	6	790	February 17, 2023
Extracing sample IDs into a Python List Hail Query & hailctl	0	23	August 8, 2024

Cant pick concerned recipient_id from mt

Related topics