Hi,
I am testing QC steps and plot generation on a hail matrix table. The QC steps get done but when I try to write the hail matrix to disk, I get the following error:
Hail version: 0.2.72-cfce5e858cab
Error summary: HailException: call_stats: n_alleles may not be missing
Has anyone faced this error before?
Hail doesn’t actually compute anything until you run something like a write to disk, which forces a result to be computed. (Then the Hail compiler can optimize the pipeline in its entirety, and schedule the distributed computation.) That means the timing of the exception doesn’t tell you much about what the problematic line was.
In this case, the error is definitely in one of the QC steps, in the call_stats
aggregator. It looks like you may have a missing alleles array. If your QC script is relatively small and you don’t mind sharing it, that would be helpful.
These are the set of steps:
mt = hl.variant_qc(mt, name=‘variant_qc’)
mt = hl.sample_qc(mt, name=‘sample_qc’)
Here is a result of input hail matrix table: mt.describe()
----------------------------------------
Global fields:
'bn': struct {
n_populations: int32,
n_samples: int32,
n_variants: int32,
n_partitions: int32,
pop_dist: array<int32>,
fst: array<float64>,
mixture: bool
}
'cohort_wrapper_type': str
----------------------------------------
Column fields:
'sample_idx': int32
'pop': int32
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'ancestral_af': float64
'af': array<float64>
'row_idx': int64
----------------------------------------
Entry fields:
'GT': call
----------------------------------------
Column key: ['sample_idx']
Row key: ['locus', 'alleles']
----------------------------------------
I verified that this exception occurs when an alleles array is missing. I would guess that is what’s happening here. You can check using:
mt.aggregate_rows(hl.agg.any(hl.is_missing(mt.alleles)))
Yes, the query resulted in True meaning an alleles array is missing.