Extracting entry fields into a separate MatrixTable

Hi all! I have three entry fields in a MatrixTable:
phased_trio_dataset.entries().mother_entry.PBT_GT
phased_trio_dataset.entries().father_entry.PBT_GT
phased_trio_dataset.entries().proband_entry.PBT_GT
They are all CallExpressions. Is there a way to save these entries (and the corresponding variants) into a separate MatrixTable? Thank you very much!

Yes!

new_mt = phased_trio_dataset.select_entries(
    mother_entry = phased_trio_dataset.mother_entry.select('PBT_GT'),
    father_entry = phased_trio_dataset.father_entry.select('PBT_GT'),
    proband_entry = phased_trio_dataset.proband_entry.select('PBT_GT')
)

Aside: .entries() is a very inefficient operation which converts from a 2-d matrix representation to a 1-d table representation. I don’t recommend using it!

What’s your end goal here? Why is it important that these are in a separate matrixtable? The MatrixTable containing these three fields can probably be used to do what you want.

Thank you so much for your response! I tried something similar myself and got the same result:


I am not sure of what went wrong.

This is the what I get with describe():
Page up

'proband_entry': struct {
    AD: array<int32>, 
    DP: int32, 
    GQ: int32, 
    GT: call, 
    MIN_DP: int32, 
    PGT: call, 
    PID: str, 
    PL: array<int32>, 
    PS: int32, 
    RGQ: int32, 
    SB: array<int32>, 
    PBT_GT: call
}
'father_entry': struct {
    AD: array<int32>, 
    DP: int32, 
    GQ: int32, 
    GT: call, 
    MIN_DP: int32, 
    PGT: call, 
    PID: str, 
    PL: array<int32>, 
    PS: int32, 
    RGQ: int32, 
    SB: array<int32>, 
    PBT_GT: call
}
'mother_entry': struct {
    AD: array<int32>, 
    DP: int32, 
    GQ: int32, 
    GT: call, 
    MIN_DP: int32, 
    PGT: call, 
    PID: str, 
    PL: array<int32>, 
    PS: int32, 
    RGQ: int32, 
    SB: array<int32>, 
    PBT_GT: call
}

Ah, my bad, I forgot to repeat the name of the mt when referring to the entry fields:

new_mt = phased_trio_dataset.select_entries(
    mother_entry = phased_trio_dataset.mother_entry.select('PBT_GT'),
    father_entry = phased_trio_dataset.father_entry.select('PBT_GT'),
    proband_entry = phased_trio_dataset.proband_entry.select('PBT_GT')
)

I’ve also updated my original post.

Aside: This is why I tend to use really short names for my matrix tables like mt or ds.

Hi @danking, thank you so much for that! Apologies for a quick follow-up question, but I am trying to retain only the heterozygous phased SNPs in the mt with this:
new_mt_het = new_mt.filter_entries(hl.agg.any(new_mt.entries().proband_entry.PBT_GT.is_het()))
However, that gave me the error:


May I know what went wrong? It worked for me before.

 new_mt_het = new_mt.filter_entries(hl.agg.any(new_mt.proband_entry.PBT_GT.is_het()))

I took out the .entries() after new_mt. That was the problem. Calling entries is saying “make me a new table based on this MatrixTable that has a row for each entry”. You were getting the proband_entry field of that new table, instead of the MatrixTable, so it told you were using a mix of expressions from two different data sources.

@CuriousGeneticist

if you haven’t, you should take a look at the visual cheat sheets, which can provide visualizations of what various table / matrixtable operations are doing:

https://hail.is/docs/0.2/cheatsheets.html