I’m trying to measure/estimate the sizes of objects in a Hail application. However, I’m not able to see cached/persisted hail.Table or hail.MatrixTable objects in the Spark UI Storage tab.
After following the directions for running Hail on a Spark cluster, I’m running Hail 0.2 on EMR 5.16.0 with Spark 2.3.1.
Based on the example in the linked document, I run the following:
In : import hail as hl
In : mt = hl.balding_nichols_model(3, 100, 100) In : mt.cache() Out: <hail.matrixtable.MatrixTable at 0x7fd46f0fbcc0> In : mt.aggregate_entries(hl.agg.mean(mt.GT.n_alt_alleles())) Out: 1.0444
However, the ‘Storage’ tab of the Spark UI remains empty despite appropriate updates to the ‘Jobs’ tab:
Is this expected? Are there other suggested ways of measuring the memory sizes of hail Tables and MatrixTables?