Adding annotations to large stored MatrixTable

awblocker · July 8, 2019, 5:11pm

In our use case, we have a large (10TiB+) MatrixTable saved to blob storage. We would like to update it by adding row and column annotations in subsequent stages, saving them back to the original MatrixTable to avoid duplicating the (large) genotype entries. Is this viable with the current design, or would be it better to maintain separate keyed Tables in storage and join them in in subsequent stages of our analysis workflow?

tpoterba · July 8, 2019, 5:13pm

this is definitely the thing to do. Currently there are no plans to make it possible to support in-place updates to MatrixTable/Table files, so storing separate variant/sample metadata as tables and joining when necessary is the most efficient option.

awblocker · July 8, 2019, 5:14pm

Got it. Thanks for the rapid reply.

Topic		Replies	Views
Annotate a MatrixTable with rows from a different MatrixTable Hail Query & hailctl	0	372	November 13, 2020
Difficulty annotating matrix table with phenotypes on Hail 0.2 Help [0.1]	4	737	July 2, 2018
Annotate variants with hom var samples Hail Query & hailctl	0	342	January 19, 2023
Variant Annotation Table Merge? Hail Query & hailctl	2	76	April 15, 2025
Hail 0.2 - Attaching MatrixTable with phenotypes and getting an error Hail Query & hailctl	7	556	April 20, 2018

Adding annotations to large stored MatrixTable

Related topics