This thread documents breaking changes as we work toward stabilizing the development branch (i.e., master, 0.2 beta) as Hail 0.2 proper.

# Log of breaking changes in 0.2 beta

**jbloom**#2

**Removed as_array parameter from PCA**

pca and hwe_normalized_pca no longer take an `as_array`

parameter. They now always return scores and loadings as arrays (formerly the `as_array=True`

option).

See the overview tutorial for example usage in GWAS, where `PC1`

becomes `scores[0]`

.

**jbloom**#3

**Removed dataset parameter from eight methods**

All methods that took a dataset and at least one required expression on that dataset no longer take a dataset parameter at all (the dataset is implicitly the source of the expression):

`grm`

`linear_regression`

`logistic_regression`

`linear_mixed_regression`

`pc_relate`

`pca`

`rrm`

`skat`

https://github.com/hail-is/hail/pull/3211

https://github.com/hail-is/hail/pull/3262

**jbloom**#4

**Changed ys to y and schema in linear regression**

Consistent with the other statistics methods, the parameter `ys`

on linear_regression is now `y`

, and when `y`

is an expression the `linreg`

fields all have type `float64`

. This is consistent with the other regression methods.

When `y`

is a list of expressions (even a list of one expression) the behavior is the same as before: the the five y-dependent `linreg`

fields have type `array[float64]`

.

The field `n_complete_samples`

is now just `n`

.

See the overview tutorial for example usage of the case where `y`

is an expression. In particular, `linear_regression_results.linreg.p_value[0].collect()`

no longer takes `[0]`

.

`ld_prune`

has changed to take a CallExpression instead of a matrix table. The new signature is `ld_prune(call_expr, r2=0.2, window=1000000, memory_per_core=256)`

.

**jbloom**#8

ld_prune no longer requires unphased genotypes (though it still makes no use of phasing information). And the parameter `window`

has been renamed `bp_window_size`

.

**konradjk**#9

While we’re at it, it also returns a Table with just `('locus', 'alleles')`

that is the set of independent variants at that threshold (rather than previously returning the MatrixTable filtered to that set).

**konradjk**#17

Minor breaking change: `hl.min_rep()`

now returns `struct`

of `locus`

(a `LocusExpression`

) and `alleles`

(an `ArrayExpression`

of type `str`

). This makes min_rep and re-key much easier as in:

```
mt = mt.key_rows_by(**hl.min_rep(mt.locus, mt.alleles))
```

**wang**#18

minor change:

the parameter names of `hl.rand_unif(min, max)`

are changing to `lower`

and `upper`

.