Is it possible to use matrix with a database tool?

annalisasnow · February 16, 2022, 9:26am

I am not sure which category that request should actually belong to. The thing which concerns me is the fact that I need to read matrix into memory, while I wish I could make a request like to a database and get some data on RSID, locus and so on.

mtx = hl.read_matrix_table("MTTEST.mt")

I can’t query mtx without reading it. Only then I can do:

mtx.filter_rows(mtx.locus==hl.locus(contig="5", pos=12114, reference_genome="GRCh37")).rsid.show()

Is it possible to keep matrix data in the database? What do I do with big matrices, should I create several “parts” and make same request via looping through the parts?

danking · February 16, 2022, 2:01pm

Hi @annalisasnow ,

hl.read_matrix_table does not put the matrix table into memory. Most matrix tables are far too large to fit in memory. Instead, it streams through the data one “partition” at a time. A “partition” is a group of rows.

Hail only reads a very small amount of data if you filter_rows and use == with one of the matrix table’s row keys. I would expect your example to read exactly one row’s worth of data from disk.

Do you experience high latency when executing your example?

patrick-schultz · February 16, 2022, 2:24pm

Also, it’s important to understand that most Hail methods are lazy. For example, hl.read_matrix_table doesn’t actually read anything into memory, it only begins to build a description of a pipeline. Only when you run a method like mt.show which requires a result to be computed is the entire pipeline compiled and executed. In your example, that pipeline says that only a single locus needs to be read, and the Hail compiler is able to see that and read only a single row from disk, as @danking said.

annalisasnow · February 24, 2022, 7:33am

yes, I guess to a certain extent. There’s also funny behaviour when importing vcf and plink files, based on the same data. As vcf is tolerated, while plink has a memory issue. I guess I should start another thread for the latter issue.

Topic		Replies	Views
Look up single row in table or matrix table Hail Query & hailctl	2	811	September 28, 2022
Import from local mt Hail Query & hailctl	7	548	March 10, 2021
MatrixTable cache Hail Query & hailctl	11	628	September 22, 2020
Reading multiple matrix tables Hail Query & hailctl	1	692	December 11, 2019
Select certain samples from MatrixTable Hail Query & hailctl	9	819	October 6, 2022

Is it possible to use matrix with a database tool?

Related topics