Hi all! I have three entry fields in a MatrixTable:
phased_trio_dataset.entries().mother_entry.PBT_GT
phased_trio_dataset.entries().father_entry.PBT_GT
phased_trio_dataset.entries().proband_entry.PBT_GT
They are all CallExpressions. Is there a way to save these entries (and the corresponding variants) into a separate MatrixTable? Thank you very much!
Yes!
new_mt = phased_trio_dataset.select_entries(
mother_entry = phased_trio_dataset.mother_entry.select('PBT_GT'),
father_entry = phased_trio_dataset.father_entry.select('PBT_GT'),
proband_entry = phased_trio_dataset.proband_entry.select('PBT_GT')
)
Aside: .entries()
is a very inefficient operation which converts from a 2-d matrix representation to a 1-d table representation. I don’t recommend using it!
What’s your end goal here? Why is it important that these are in a separate matrixtable? The MatrixTable containing these three fields can probably be used to do what you want.
Thank you so much for your response! I tried something similar myself and got the same result:
I am not sure of what went wrong.
This is the what I get with describe():
Page up
'proband_entry': struct {
AD: array<int32>,
DP: int32,
GQ: int32,
GT: call,
MIN_DP: int32,
PGT: call,
PID: str,
PL: array<int32>,
PS: int32,
RGQ: int32,
SB: array<int32>,
PBT_GT: call
}
'father_entry': struct {
AD: array<int32>,
DP: int32,
GQ: int32,
GT: call,
MIN_DP: int32,
PGT: call,
PID: str,
PL: array<int32>,
PS: int32,
RGQ: int32,
SB: array<int32>,
PBT_GT: call
}
'mother_entry': struct {
AD: array<int32>,
DP: int32,
GQ: int32,
GT: call,
MIN_DP: int32,
PGT: call,
PID: str,
PL: array<int32>,
PS: int32,
RGQ: int32,
SB: array<int32>,
PBT_GT: call
}
Ah, my bad, I forgot to repeat the name of the mt when referring to the entry fields:
new_mt = phased_trio_dataset.select_entries(
mother_entry = phased_trio_dataset.mother_entry.select('PBT_GT'),
father_entry = phased_trio_dataset.father_entry.select('PBT_GT'),
proband_entry = phased_trio_dataset.proband_entry.select('PBT_GT')
)
I’ve also updated my original post.
Aside: This is why I tend to use really short names for my matrix tables like mt
or ds
.
Hi @danking, thank you so much for that! Apologies for a quick follow-up question, but I am trying to retain only the heterozygous phased SNPs in the mt with this:
new_mt_het = new_mt.filter_entries(hl.agg.any(new_mt.entries().proband_entry.PBT_GT.is_het()))
However, that gave me the error:
May I know what went wrong? It worked for me before.
new_mt_het = new_mt.filter_entries(hl.agg.any(new_mt.proband_entry.PBT_GT.is_het()))
I took out the .entries()
after new_mt
. That was the problem. Calling entries
is saying “make me a new table based on this MatrixTable that has a row for each entry”. You were getting the proband_entry
field of that new table, instead of the MatrixTable
, so it told you were using a mix of expressions from two different data sources.
if you haven’t, you should take a look at the visual cheat sheets, which can provide visualizations of what various table / matrixtable operations are doing: