What spec could cloud read 3.5 TB matrix table?

Hi, I am studying ancestry with gnomAD.

This bucket is the matrix table I want to load.

3.5T gnomad.genomes.v3.1.2.hgdp_1kg_subset_dense.mt [1]

What spec can cloud read 3.5T matrix table?

It means what spec of cloud or spark cluster load mt and run hail mt functions in the expected return time of the process.

Thank you for your attention!

[1] Google Cloud Platform

That depends a lot on what you want to run. I think using on the order of 500 cores (60 n1-standard-8 preemptible workers) would be a good place to start.

1 Like