Hail 0.2 - Questions on MatrixTable joins for columns

bjo · April 14, 2018, 11:40pm

Hi,

I have a couple of questions about annotating my MatrixTable.

I have an imported vcf (ds), and I want to annotate it based on a file that has samples as columns:

ID SAMP-A SAMP-B SAMP-C SAMP-D …
PC1 0.0154 0.0139 0.0145 -0.0728 …
PC2 -0.0093 -0.0097 -0.0093 -0.0077 …
PC3 0.0107 0.0067 0.0327 -0.0044 …
PC4 -0.0038 -0.0144 0.0056 -0.0146 …
PC5 -0.0083 0.0296 0.0510 0.0063 …

using the command:

covs_kt = hl.import_matrix_table(anno_path, entry_type=hl.tfloat64, row_fields={‘ID’: hl.tstr}, row_key = ‘ID’)

Now, SAMP-A, SAMP-B, are entries of ds.s and covs_kt.col_id

I have a couple of questions at this stage:
In hail 0.1, I had to export covs_kt to pandas, transpose it, re-convert to a KeyTable, and then run annotate_samples_table(covs_kt, root=‘sa.covs’)

I now understand join based on index are possible for rows, but it seems like they’re not possible for columns - so covs_kt[ds.s] will not work unless I transpose covs_kt - am I correct?
What would be a good way to run annotate_cols with this MatrixTable? Eventually I want to be able to annotate the columns with all ~20,000 expression values - how can I do this based on all rows (or columns) of covs_kt? (This bit wasn’t clear from the documentation on annotate_cols() function)

Thank you for your help!

jbloom · April 15, 2018, 3:25pm

For 1, yes, you’ll need to transpose TSV to read it in as a Table whose rows are keyed by ID. Given that your table isn’t all that big, even with 20K fields, I’d recommend just saving a transposed copy. You might also want to rename covs_kt since KeyTable is gone. Internally we’ve settled on ht for Hail Table and mt for MatrixTable.

Were your table huge, you could read it in as a MatrixTable, create a BlockMatrix using BlockMatrix.from_entry_expr, transpose the BlockMatrix, and convert that back to a Table or MatrixTable (coming soon!), which could be used directly and/or exported row-wise.

For 2, see the ** trick for adding all fields in Adding column fields section of the Overview Tutorial.

bjo · April 15, 2018, 5:34pm

Thank you, that is helpful!

Topic		Replies	Views
Select certain samples from MatrixTable Hail Query & hailctl	9	821	October 6, 2022
Annotate matrixtable with count from another table Hail Query & hailctl	2	651	May 3, 2019
Key type mismatch: cannot index table with given expressions Hail Query & hailctl	1	197	January 2, 2024
`Table` to `MatrixTable` to export `VCF` Hail Query & hailctl	2	438	May 20, 2023
Outer join using union_cols() gets NA values for non-key fields [hail 0.2.93] Hail Query & hailctl	0	201	April 7, 2022

Hail 0.2 - Questions on MatrixTable joins for columns

Related topics