This thread documents breaking changes as we work toward stabilizing the development branch (i.e., master, 0.2 beta) as Hail 0.2 proper.
Removed as_array parameter from PCA
pca and hwe_normalized_pca no longer take an as_array
parameter. They now always return scores and loadings as arrays (formerly the as_array=True
option).
See the overview tutorial for example usage in GWAS, where PC1
becomes scores[0]
.
Removed dataset parameter from eight methods
All methods that took a dataset and at least one required expression on that dataset no longer take a dataset parameter at all (the dataset is implicitly the source of the expression):
grm
linear_regression
logistic_regression
linear_mixed_regression
pc_relate
pca
rrm
skat
https://github.com/hail-is/hail/pull/3211
https://github.com/hail-is/hail/pull/3262
Changed ys to y and schema in linear regression
Consistent with the other statistics methods, the parameter ys
on linear_regression is now y
, and when y
is an expression the linreg
fields all have type float64
. This is consistent with the other regression methods.
When y
is a list of expressions (even a list of one expression) the behavior is the same as before: the the five y-dependent linreg
fields have type array[float64]
.
The field n_complete_samples
is now just n
.
See the overview tutorial for example usage of the case where y
is an expression. In particular, linear_regression_results.linreg.p_value[0].collect()
no longer takes [0]
.
See:
Oops. See:
ld_prune
has changed to take a CallExpression instead of a matrix table. The new signature is ld_prune(call_expr, r2=0.2, window=1000000, memory_per_core=256)
.
ld_prune no longer requires unphased genotypes (though it still makes no use of phasing information). And the parameter window
has been renamed bp_window_size
.
While we’re at it, it also returns a Table with just ('locus', 'alleles')
that is the set of independent variants at that threshold (rather than previously returning the MatrixTable filtered to that set).
See:
see:
see:
see:
see:
see:
see:
Minor breaking change: hl.min_rep()
now returns struct
of locus
(a LocusExpression
) and alleles
(an ArrayExpression
of type str
). This makes min_rep and re-key much easier as in:
mt = mt.key_rows_by(**hl.min_rep(mt.locus, mt.alleles))
minor change:
the parameter names of hl.rand_unif(min, max)
are changing to lower
and upper
.