Very slow write and count operations after densify

Hello Hail team!

I am attempting to read in a matrix table, perform a densify operation, then filter to a subset of the data and mt.write(). The filter operations are being performed quickly, as I’d expect, but the mt.write() operation has been running for over an hour and is still nowhere near done.

The original mt is a huge amount of data, count() returns (2736925182, 7783). We filter it down to just CHR 22 of ~500 of these individuals, however, before the write operation.

Is there any way to speed up the code?

import hail as hl
import argparse

import numpy as np
import pandas as pd
from import show, output_notebook
from bokeh.layouts import gridplot

mt = hl.read_matrix_table(‘gs://’)
mt = hl.experimental.densify(mt)

filter to chr22

intervals = [‘chr22’]
mt = hl.filter_intervals(mt, [hl.parse_locus_interval(x, reference_genome=‘GRCh38’) for x in intervals])

#filter to test individuals
ppmi_ids = df[0]
samples_to_keep = set(ppmi_ids)
set_to_keep = hl.literal(samples_to_keep)
test_data = mt.filter_cols(set_to_keep.contains(mt.meta.external_id))

Save mt densified and filtered to CHR22/PPMI

test_data.write(‘gs://dataproc-staging-us-east1-942231253036-bw4veo0a/’, overwrite=True)