About sparse matrix table

mhebrard · February 6, 2020, 3:49am

Hi, I am interested to try out using sparse matrix table but I am a bit confuse about the workflow to properly load the data.

1> I noticed that transform_gvcf should be run on a mt with only 1 column. That means if I have an individual gvcf I am god to go, but if I have a gvcf with multiple individuals, I need to select each column and transform it to its own sparse mt… am I right ?

2> If I get it right, combine_gvcf take a list of sparse mt. That means if I get a sparse mt for each of my individuals at Q1, then I can combine them into one unique sparse mt using that function… am I right ?

3> Can I combine_gvcf incrementally ? let say I have 3 individuals. First I use transform_gvcf of individual A into A.smt and B into B.smt. Then I combine_gvcf of [A.smt, B.smt] into all.smt. Then I transform_gvcf of individual C into C.smt. Can I combine_gvcf of [all.smt, C.smt] into all.smt ?

Thanks

chrisvittal · February 6, 2020, 3:17pm

Thank you for your interest! To answer your questions:

Sort of. The issue here is the correctness of INFO fields. The transform_gvcf method copies every INFO field (except DP and END) into the matrix table entry into a field called gvcf_info. Recomputing those INFO fields requires doing an aggregation over those entries. If you have a field such as VAR_DP which would then be aggregated back with a sum, you will end up with a result that is too large because we will have copied the original VAR_DP multiple times.
Correct.
Yes! combine_gvcfs inputs and outputs are sparse matrix tables. Therefore we can easily add samples in an incremental fashion. Be aware this requires a read and write of all data so it can be costly in terms of compute time, but we can incrementally add samples easily.

Topic		Replies	Views
How to merge two or more sparse MT into one joint-called sparseMT? Hail Query & hailctl	3	528	October 5, 2022
Importing many sample-specific VCFs Hail Query & hailctl	12	1210	December 12, 2022
Possible vcf_combiner issue Hail Query & hailctl	19	1242	June 15, 2020
Store multiple vcfs into single MatrixTable Hail Query & hailctl	10	759	September 9, 2020
Performance of writing matrixtable on 0.2 Help [0.1]	9	1853	September 29, 2018

About sparse matrix table

Related topics