I have a table that contains exon inclusion estimates from RNA-Seq with the columns “event” (exon), “sample” (the individual), “tissue”, and “PSI” (exon inclusion ratio).
I would like to create a matrix table in which the rows are the events, the columns are the samples, and each entry contains a struct where the fields are the tissues and the values are the PSI measurements.
The best I was able to do was mt = ht.to_matrix_table(["EVENT"], ["sample"])
, but that gives me the following schema:
----------------------------------------
Global fields:
None
----------------------------------------
Column fields:
'sample': str
----------------------------------------
Row fields:
'EVENT': str
----------------------------------------
Entry fields:
'PSI': float64
'tissue': str
----------------------------------------
Column key: ['sample']
Row key: ['EVENT']
How can I convert the entry fields to a struct?
Many thanks!
to_matrix_table
assumes a unique mapping from row and column fields to entries, which it shouldn’t (sorry!). For now, you can do this instead:
In [28]: t = hl.utils.range_table(27)
...:
...: t = t.key_by(event='exon_' + hl.str(t.idx // 9), sample='sample_' + hl.str(t.idx // 3 % 3), tissue='tissue_' + hl.str(t.idx % 3), psi=t.idx).drop('idx')
...:
...: t.show()
...:
...: all_tissues = t.aggregate(hl.agg.collect_as_set(t.tissue))
...:
...: mt = t.to_matrix_table(['event'], ['sample', 'tissue'])
...: mt = mt.group_cols_by(mt.sample).aggregate(tissue_to_psi_dict = hl.dict(hl.agg.collect((mt.tissue, mt.psi))))
...: mt = mt.select_entries(**{
...: tissue_name: mt.tissue_to_psi_dict[tissue_name] for tissue_name in all_tissues
...: })
...: mt.show()
...:
+----------+------------+------------+-------+
| event | sample | tissue | psi |
+----------+------------+------------+-------+
| str | str | str | int32 |
+----------+------------+------------+-------+
| "exon_0" | "sample_0" | "tissue_0" | 0 |
| "exon_0" | "sample_0" | "tissue_1" | 1 |
| "exon_0" | "sample_0" | "tissue_2" | 2 |
| "exon_0" | "sample_1" | "tissue_0" | 3 |
| "exon_0" | "sample_1" | "tissue_1" | 4 |
| "exon_0" | "sample_1" | "tissue_2" | 5 |
| "exon_0" | "sample_2" | "tissue_0" | 6 |
| "exon_0" | "sample_2" | "tissue_1" | 7 |
| "exon_0" | "sample_2" | "tissue_2" | 8 |
| "exon_1" | "sample_0" | "tissue_0" | 9 |
| "exon_1" | "sample_0" | "tissue_1" | 10 |
| "exon_1" | "sample_0" | "tissue_2" | 11 |
| "exon_1" | "sample_1" | "tissue_0" | 12 |
| "exon_1" | "sample_1" | "tissue_1" | 13 |
| "exon_1" | "sample_1" | "tissue_2" | 14 |
| "exon_1" | "sample_2" | "tissue_0" | 15 |
| "exon_1" | "sample_2" | "tissue_1" | 16 |
| "exon_1" | "sample_2" | "tissue_2" | 17 |
| "exon_2" | "sample_0" | "tissue_0" | 18 |
| "exon_2" | "sample_0" | "tissue_1" | 19 |
| "exon_2" | "sample_0" | "tissue_2" | 20 |
| "exon_2" | "sample_1" | "tissue_0" | 21 |
| "exon_2" | "sample_1" | "tissue_1" | 22 |
| "exon_2" | "sample_1" | "tissue_2" | 23 |
| "exon_2" | "sample_2" | "tissue_0" | 24 |
| "exon_2" | "sample_2" | "tissue_1" | 25 |
| "exon_2" | "sample_2" | "tissue_2" | 26 |
+----------+------------+------------+-------+
2021-05-19 18:06:46 Hail: INFO: Coerced sorted dataset
2021-05-19 18:06:46 Hail: INFO: Coerced dataset with out-of-order partitions.
+----------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| event | 'sample_0'.tissue_2 | 'sample_0'.tissue_1 | 'sample_0'.tissue_0 | 'sample_1'.tissue_2 | 'sample_1'.tissue_1 | 'sample_1'.tissue_0 | 'sample_2'.tissue_2 | 'sample_2'.tissue_1 |
+----------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| str | int32 | int32 | int32 | int32 | int32 | int32 | int32 | int32 |
+----------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| "exon_0" | 2 | 1 | 0 | 5 | 4 | 3 | 8 | 7 |
| "exon_1" | 11 | 10 | 9 | 14 | 13 | 12 | 17 | 16 |
| "exon_2" | 20 | 19 | 18 | 23 | 22 | 21 | 26 | 25 |
+----------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
+---------------------+
| 'sample_2'.tissue_0 |
+---------------------+
| int32 |
+---------------------+
| 6 |
| 15 |
| 24 |
+---------------------+
1 Like