Writing MatrixTable on DNAnexus extremely slow + “server connection failed” at the end

Hi everyone,
I’m working with UK Biobank exome sequences (450k individuals) on DNAnexus using Hail, and I’m having persistent issues when trying to write a MatrixTable.

My goal is to create the MatrixTable so I can later run the full QC pipeline. However, the write step takes many hours and then fails at the end with a “server connection failed” error.

Cluster / Job Setup

I first used this cluster configuration:

  • Instance type: mem1_hdd1_v2_x16

  • Initial worker count: 16

these are all available option

mem1_hdd1_x2
mem1_hdd1_x4
mem1_hdd1_x8
mem1_hdd1_x16
mem1_hdd1_x36
mem1_hdd1_v2_x2
mem1_hdd1_v2_x4
mem1_hdd1_v2_x8
mem1_hdd1_v2_x16
mem1_hdd1_v2_x36
mem1_hdd1_v2_x72
mem1_hdd1_v2_x96
mem1_ssd1_x2
mem1_ssd1_x4
mem1_ssd1_x8
mem1_ssd1_x16
mem1_ssd1_x32
mem1_ssd1_x36
mem1_ssd1_v2_x2
mem1_ssd1_v2_x4
mem1_ssd1_v2_x8
mem1_ssd1_v2_x16
mem1_ssd1_v2_x36
mem1_ssd1_v2_x72
mem1_ssd2_x2
mem1_ssd2_x4
mem1_ssd2_x8
mem1_ssd2_x16
mem1_ssd2_x36
mem1_ssd2_v2_x2
mem1_ssd2_v2_x4
mem1_ssd2_v2_x8
mem1_ssd2_v2_x16
mem1_ssd2_v2_x36
mem1_ssd2_v2_x72
mem1_hdd2_x1
mem1_hdd2_x8
mem1_hdd2_x32
mem2_ssd1_x2
mem2_ssd1_x4
mem2_ssd1_x8
mem2_ssd1_v2_x2
mem2_ssd1_v2_x4
mem2_ssd1_v2_x8
mem2_ssd1_v2_x16
mem2_ssd1_v2_x32
mem2_ssd1_v2_x48
mem2_ssd1_v2_x64
mem2_ssd1_v2_x96
mem2_ssd2_x2
mem2_ssd2_x4
mem2_ssd2_x8
mem2_ssd2_x16
mem2_ssd2_x40
mem2_ssd2_x64
mem2_ssd2_v2_x2
mem2_ssd2_v2_x4
mem2_ssd2_v2_x8
mem2_ssd2_v2_x16
mem2_ssd2_v2_x32
mem2_ssd2_v2_x48
mem2_ssd2_v2_x64
mem2_ssd2_v2_x96
mem2_hdd2_x1
mem2_hdd2_x2
mem2_hdd2_x4
mem2_hdd2_v2_x2
mem2_hdd2_v2_x4
mem3_ssd1_x2
mem3_ssd1_x4
mem3_ssd1_x8
mem3_ssd1_x16
mem3_ssd1_x32
mem3_ssd1_v2_x2
mem3_ssd1_v2_x4
mem3_ssd1_v2_x8
mem3_ssd1_v2_x16
mem3_ssd1_v2_x32
mem3_ssd1_v2_x48
mem3_ssd1_v2_x64
mem3_ssd1_v2_x96
mem3_ssd2_x4
mem3_ssd2_x8
mem3_ssd2_x16
mem3_ssd2_x32
mem3_ssd2_v2_x2
mem3_ssd2_v2_x4
mem3_ssd2_v2_x8
mem3_ssd2_v2_x16
mem3_ssd2_v2_x32
mem3_ssd2_v2_x64
mem3_ssd3_x2
mem3_ssd3_x4
mem3_ssd3_x8
mem3_ssd3_x12
mem3_ssd3_x24
mem3_ssd3_x48
mem3_ssd3_x96
mem3_hdd2_x2
mem3_hdd2_x4
mem3_hdd2_x8
mem3_hdd2_v2_x2
mem3_hdd2_v2_x4
mem3_hdd2_v2_x8
mem4_ssd1_x128

I am using the right ones for the write() function, and what about later when i want to do quality control?

this is my code

import hail as hl
import dxpy
hl.init()
VCFs_path = “file:///mnt/project/…”
mt = hl.import_vcf(
VCFs_path,
force_bgz=True,
reference_genome=“GRCh38”,
array_elements_required=False
)
db_name = “x”
mt_name = “y.mt”
db_uri = dxpy.find_one_data_object(name=f"{db_name}“, classname=“database”)[‘id’]
url = f"dnax://{db_uri}/{mt_name}”
mt.write(url)
mt = hl.read_matrix_table(url)