In our use case, we have a large (10TiB+) MatrixTable saved to blob storage. We would like to update it by adding row and column annotations in subsequent stages, saving them back to the original MatrixTable to avoid duplicating the (large) genotype entries. Is this viable with the current design, or would be it better to maintain separate keyed Tables in storage and join them in in subsequent stages of our analysis workflow?
this is definitely the thing to do. Currently there are no plans to make it possible to support in-place updates to MatrixTable/Table files, so storing separate variant/sample metadata as tables and joining when necessary is the most efficient option.
Got it. Thanks for the rapid reply.