[Breaking change] New changes to select, annotate, key_by interface for 0.2


#1

https://github.com/hail-is/hail/pull/3297

This PR will change the way keys on Tables and MatrixTables are dealt with in certain functions. (The doc links will be updated once this pull request lands.)

key_by(*expr, **named_exprs) (and key_rows_by and key_cols_by)

is the only method that can modify key fields. The interface is identical to the current select interface, where non-named exprs must be field references (not necessarily top-level), but non-field-reference expressions can still be used if a name is provided. All unused former key expressions are retained as value fields.

partition_rows_by(partition_key, *exprs, **named_exprs)

lets you specify a partition key from the key fields in the new matrix table. It’s identical to key_rows_by, but takes a list of key fields as its first arguement as a partition key. This interface is still subject to change.

annotate(**named_exprs) (and annotate_rows and annotate_cols)

An attempt to annotate over a key field will cause an error. If you actually want to annotate over a key field, use key_by directly (if preserving the field as a key), or specify a new key and then call annotate.

select(*exprs, **named_exprs) (and select_rows and select_cols)

likewise deals only with value fields. All key fields are automatically preserved; if locus and alleles are already row keys in a MatrixTable, you can do mt.select_rows() to drop all the other fields. If you want to

drop(*fields)

In order to drop a field, remove it from the key first with key_by.

transmute(**named_exprs)

Restrictions on what can be overwritten with transmute are identical to the restrictions on annotate. The main change is that referencing a key field is allowed, but will not cause the field to be dropped. Value fields will continue to be dropped.


Log of breaking changes in 0.2 beta