ch-kr
September 29, 2020, 4:31pm
1
Hi hail team,
I ran into a ClassTooLargeException while running build_models
on this input: gs://gnomad-public/papers/2019-flagship-lof/v1.0/model/prop_observed_by_coverage_no_common_pass_filtered_bins.ht
. Do you have any suggestions for how to fix this?
can we have the log file?
ch-kr
September 29, 2020, 4:47pm
3
yes, just sent it to the hail-team email. thanks for responding so quickly!
Looks like this is creating an enormous query:
Calibrates high coverage model (returns intercept and slope)
"""
# TODO: try square weighting
ht = ht.annotate(high_coverage_proportion_observed=ht.observed_variants / ht.possible_variants)
return ht.aggregate(hl.agg.group_by(ht.cpg,
hl.agg.linreg(ht.high_coverage_proportion_observed, [1, ht.mu_snp],
weight=ht.possible_variants if weighted else None)
).map_values(lambda x: x.beta))
def build_plateau_models_pop(ht: hl.Table, weighted: bool = False) -> Dict[str, Tuple[float, float]]:
"""
Calibrates high coverage model (returns intercept and slope)
"""
pop_lengths = get_all_pop_lengths(ht)
agg_expr = {
pop: [hl.agg.group_by(ht.cpg,
hl.agg.linreg(ht[f'observed_{pop}'][i] / ht.possible_variants, [1, ht.mu_snp],
weight=ht.possible_variants if weighted else None)
).map_values(lambda x: x.beta) for i in range(length)]
for length, pop in pop_lengths
Iām looking into ways we might work around this.
1 Like