Hail 0.2 - Questions on MatrixTable joins for columns


I have a couple of questions about annotating my MatrixTable.

I have an imported vcf (ds), and I want to annotate it based on a file that has samples as columns:

PC1 0.0154 0.0139 0.0145 -0.0728 …
PC2 -0.0093 -0.0097 -0.0093 -0.0077 …
PC3 0.0107 0.0067 0.0327 -0.0044 …
PC4 -0.0038 -0.0144 0.0056 -0.0146 …
PC5 -0.0083 0.0296 0.0510 0.0063 …

using the command:

covs_kt = hl.import_matrix_table(anno_path, entry_type=hl.tfloat64, row_fields={‘ID’: hl.tstr}, row_key = ‘ID’)

Now, SAMP-A, SAMP-B, are entries of ds.s and covs_kt.col_id

I have a couple of questions at this stage:
In hail 0.1, I had to export covs_kt to pandas, transpose it, re-convert to a KeyTable, and then run annotate_samples_table(covs_kt, root=‘sa.covs’)

  1. I now understand join based on index are possible for rows, but it seems like they’re not possible for columns - so covs_kt[ds.s] will not work unless I transpose covs_kt - am I correct?
  2. What would be a good way to run annotate_cols with this MatrixTable? Eventually I want to be able to annotate the columns with all ~20,000 expression values - how can I do this based on all rows (or columns) of covs_kt? (This bit wasn’t clear from the documentation on annotate_cols() function)

Thank you for your help!

For 1, yes, you’ll need to transpose TSV to read it in as a Table whose rows are keyed by ID. Given that your table isn’t all that big, even with 20K fields, I’d recommend just saving a transposed copy. You might also want to rename covs_kt since KeyTable is gone. Internally we’ve settled on ht for Hail Table and mt for MatrixTable.

Were your table huge, you could read it in as a MatrixTable, create a BlockMatrix using BlockMatrix.from_entry_expr, transpose the BlockMatrix, and convert that back to a Table or MatrixTable (coming soon!), which could be used directly and/or exported row-wise.

For 2, see the ** trick for adding all fields in Adding column fields section of the Overview Tutorial.

1 Like

Thank you, that is helpful!