This thread documents breaking changes as we work toward stabilizing the development branch (i.e., master, 0.2 beta) as Hail 0.2 proper.
Removed as_array parameter from PCA
See the overview tutorial for example usage in GWAS, where
Removed dataset parameter from eight methods
All methods that took a dataset and at least one required expression on that dataset no longer take a dataset parameter at all (the dataset is implicitly the source of the expression):
Changed ys to y and schema in linear regression
Consistent with the other statistics methods, the parameter
ys on linear_regression is now
y, and when
y is an expression the
linreg fields all have type
float64. This is consistent with the other regression methods.
y is a list of expressions (even a list of one expression) the behavior is the same as before: the the five y-dependent
linreg fields have type
n_complete_samples is now just
See the overview tutorial for example usage of the case where
y is an expression. In particular,
linear_regression_results.linreg.p_value.collect() no longer takes
ld_prune has changed to take a CallExpression instead of a matrix table. The new signature is
ld_prune(call_expr, r2=0.2, window=1000000, memory_per_core=256).
ld_prune no longer requires unphased genotypes (though it still makes no use of phasing information). And the parameter
window has been renamed
While we’re at it, it also returns a Table with just
('locus', 'alleles') that is the set of independent variants at that threshold (rather than previously returning the MatrixTable filtered to that set).
Minor breaking change:
hl.min_rep() now returns
ArrayExpression of type
str). This makes min_rep and re-key much easier as in:
mt = mt.key_rows_by(**hl.min_rep(mt.locus, mt.alleles))