How to use impute_sex() on a run_combiner() sparse MatrixTable?

Hello! I have been struggling to make impute_sex() work as expected. A few days ago, I was having problems with run_combiner(), but I finally got it to work with my gVCFs by updating to Hail version 0.2.67. Now, reading the documentation, I see that run_combiner() generates a Sparse MatrixTable, so I have to convert the LGT entries into GT. Apparently, hl.experimental.sparse_split_multi() should suffice, but I am getting an error.

Also, the documentation mentions that I should “densify” the sparse MatrixTable, but I could not understand how hl.experimental.sparse_split_multi() and densify() should work together, perhaps the documentation could cover a little more this function. Until now, this is my script:

Parameters:
AAF_THRESHOLD = 0.001
MALE_THRESHOLD = 0.8
FEMALE_THRESHOLD = 0.5

Load a sparse MatrixTable, output from run_combiner():
mt = hl.read_matrix_table(“dataset.mt”)

Split multiallelic variants:
mt_split = hl.experimental.sparse_split_multi(mt, filter_changed_loci=True)

Filter for biallelic SNPs:
mt_split = mt_split.filter_rows((hl.len(mt_split.alleles) == 2) & hl.is_snp(mt_split.alleles[0], mt_split.alleles[1]))

Try to run impute_sex():
sex_ht_imputed = hl.impute_sex(
mt_split.GT,
aaf_threshold=AAF_THRESHOLD,
male_threshold=MALE_THRESHOLD,
female_threshold=FEMALE_THRESHOLD,
aaf=None
)

Compute, try to check the output:
sex_ht_imputed.show()

I am getting this error:
FatalError: HailException: array index out of bounds: index=0, length=0

Am I missing something? If anybody could give some tips I would be very thankful.

The sparse matrixtable that’s produced by the VCF combiner definitely doesn’t have enough utilities or docs built up around it. We’re making a plan to fix that.

Is there more to the stack trace? That would help us figure out where the invalid array reference is coming from.

Thanks, here is the complete error message:

hail_error.txt (22.9 KB)

OK, the error is coming from sparse_split_multi here:

  File "/home/antonio/miniconda3/envs/hail/lib/python3.7/site-packages/hail/experimental/vcf_combiner/sparse_split_multi.py", line 183, in <lambda>
    .map(lambda idx: old_entry.LPL[idx])))))

I have a hypothesis – does your dataset have haploid calls for sex chromosomes?

Hello, thanks for the feedback. Yes, it has for sex chromosomes and mitochondrial genome.