Densify() operation

Hello Hail team,

I am trying to understand the best way to work with my sparse matrix table.

My understanding is that before performing any filters, I should use hl.experimental.densify() to convert from a sparse mt to a dense mt. When I do this, however, I go from ~1100 steps to ~69,000 steps for any count, write, and show operations. Because of this a single count() or write() will take hours to calculate.

What can I do to make this more efficient?


An example script might look like:
import hail as hl

import argparse


parser = argparse.ArgumentParser()

parser.add_argument(“-f”, “–full_run”, action=“store_true”, help=“Runs on chr22 and chrX only by default. If full_run is set, it runs on the whole matrix. WARNING: This will be VERY expensive”)

parser.add_argument(“-w”, “–overwrite”, action=‘store_true’, help=“If set will overwrite output matrix if it already exists”)

requiredNamed = parser.add_argument_group(‘required named arguments’)

requiredNamed.add_argument(“-i”, “–input_mt_path”, required=True)

requiredNamed.add_argument(“-o”, “–output_mt_path”, required=True)

#requiredNamed.add_argument(“-p”, “–requester_pays_project_id”, help=“Project ID to bill to when accessing requester pays bucket, needed to access hail annotationDB”)

args = parser.parse_args()

Store Inputs

input_mt_path = args.input_mt_path

output_mt_path = args.output_mt_path

#requester_pays_project_id = args.requester_pays_project_id

read mt

mt = hl.read_matrix_table(input_mt_path)

mt = hl.experimental.densify(mt)

Save mt densified and filtered to CHR22/PPMI

mt.write(output_mt_path, overwrite=True)