Thanks for the reply.
Here’s the mt.describe() of my original matrix table:
----------------------------------------
Global fields:
None
----------------------------------------
Column fields:
's': str
'pheno': struct {
fam_id: str,
pat_id: str,
mat_id: str,
is_female: bool,
is_case: bool
}
'sample_id': str
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'rsid': str
'qual': float64
'filters': set<str>
'info': struct {
AF: array<float64>,
AQ: array<int32>,
AC: array<int32>,
AN: int32,
segdup_flag: str
}
'is_y': bool
----------------------------------------
Entry fields:
'GT': call
'RNC': array<str>
'DP': int32
'AD': array<int32>
'SB': array<int32>
'GQ': int32
'PL': array<int32>
----------------------------------------
Column key: ['sample_id']
Row key: ['locus', 'alleles']
----------------------------------------
Here’s the output from trio_mt.describe():
----------------------------------------
Global fields:
None
----------------------------------------
Column fields:
'id': str
'proband': struct {
s: str,
pheno: struct {
fam_id: str,
pat_id: str,
mat_id: str,
is_female: bool,
is_case: bool
},
sample_id: str,
}
'father': struct {
s: str,
pheno: struct {
fam_id: str,
pat_id: str,
mat_id: str,
is_female: bool,
is_case: bool
},
sample_id: str,
}
'mother': struct {
s: str,
pheno: struct {
fam_id: str,
pat_id: str,
mat_id: str,
is_female: bool,
is_case: bool
},
sample_id: str,
}
'is_female': bool
'fam_id': str
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'rsid': str
'qual': float64
'filters': set<str>
'info': struct {
AF: array<float64>,
AQ: array<int32>,
AC: array<int32>,
AN: int32,
segdup_flag: str
}
'is_y': bool
----------------------------------------
Entry fields:
'proband_entry': struct {
GT: call,
RNC: array<str>,
DP: int32,
AD: array<int32>,
SB: array<int32>,
GQ: int32,
PL: array<int32>
}
'father_entry': struct {
GT: call,
RNC: array<str>,
DP: int32,
AD: array<int32>,
SB: array<int32>,
GQ: int32,
PL: array<int32>
}
'mother_entry': struct {
GT: call,
RNC: array<str>,
DP: int32,
AD: array<int32>,
SB: array<int32>,
GQ: int32,
PL: array<int32>
}
----------------------------------------
Column key: ['id']
Row key: ['locus', 'alleles']
----------------------------------------
When I rethink this question, I found both matrix tables used the sample id as a column key, labeled as “sample_id” in mt, and “id” in trio_mt. The “id” in trio_mt should be the union id from proband, father, and mother. So, to exclude the samples from the trio matrix, I could filter the id in trio_mt from sample_id in mt. Is that correct?
Then, my question would be how to annotate columns in mt using the id from trio_mt, here’s my code:
trio_mt = hl.trio_matrix(mt, pedigree, complete_trios=True)
mt = mt.annotate_cols(id_in_trio = trio_mt[mt.sample_id].id)
cc_mt = mt.filter_cols(mt.sample_id == mt.id_in_trio, keep=False)
But I got this error message:
TypeError Traceback (most recent call last)
<ipython-input-24-c24456f7622f> in <module>
1 trio_mt = hl.trio_matrix(mt, pedigree, complete_trios=True)
----> 2 mt = mt.annotate_cols(id_in_trio = trio_mt[mt.sample_id].id)
3 cc_mt = mt.filter_cols(mt.sample_id == mt.id_in_trio, keep=False)
/gpfs/home/qwu24/ngs/lib/python3.7/site-packages/hail/matrixtable.py in __getitem__(self, item)
627 except TypeError as e:
628 raise invalid_usage from e
--> 629 raise invalid_usage
630
631 @property
TypeError: MatrixTable.__getitem__: invalid index argument(s)
Usage 1: field selection: mt['field']
Usage 2: Entry joining: mt[mt2.row_key, mt2.col_key]
To join row or column fields, use one of the following:
rows:
mt.index_rows(mt2.row_key)
mt.rows().index(mt2.row_key)
mt.rows()[mt2.row_key]
cols:
mt.index_cols(mt2.col_key)
mt.cols().index(mt2.col_key)
mt.cols()[mt2.col_key]
Any help would be appreciated.