Hi,
I have a couple of questions about annotating my MatrixTable.
I have an imported vcf (ds), and I want to annotate it based on a file that has samples as columns:
ID SAMP-A SAMP-B SAMP-C SAMP-D …
PC1 0.0154 0.0139 0.0145 -0.0728 …
PC2 -0.0093 -0.0097 -0.0093 -0.0077 …
PC3 0.0107 0.0067 0.0327 -0.0044 …
PC4 -0.0038 -0.0144 0.0056 -0.0146 …
PC5 -0.0083 0.0296 0.0510 0.0063 …
using the command:
covs_kt = hl.import_matrix_table(anno_path, entry_type=hl.tfloat64, row_fields={‘ID’: hl.tstr}, row_key = ‘ID’)
Now, SAMP-A, SAMP-B, are entries of ds.s and covs_kt.col_id
I have a couple of questions at this stage:
In hail 0.1, I had to export covs_kt to pandas, transpose it, re-convert to a KeyTable, and then run annotate_samples_table(covs_kt, root=‘sa.covs’)
- I now understand join based on index are possible for rows, but it seems like they’re not possible for columns - so covs_kt[ds.s] will not work unless I transpose covs_kt - am I correct?
- What would be a good way to run annotate_cols with this MatrixTable? Eventually I want to be able to annotate the columns with all ~20,000 expression values - how can I do this based on all rows (or columns) of covs_kt? (This bit wasn’t clear from the documentation on annotate_cols() function)
Thank you for your help!