https://github.com/hail-is/hail/pull/3297
This PR will change the way keys on Tables and MatrixTables are dealt with in certain functions. (The doc links will be updated once this pull request lands.)
key_by(*expr, **named_exprs) (and key_rows_by and key_cols_by)
is the only method that can modify key fields. The interface is identical to the current select interface, where non-named exprs must be field references (not necessarily top-level), but non-field-reference expressions can still be used if a name is provided. All unused former key expressions are retained as value fields.
partition_rows_by(partition_key, *exprs, **named_exprs)
lets you specify a partition key from the key fields in the new matrix table. It’s identical to key_rows_by, but takes a list of key fields as its first arguement as a partition key. This interface is still subject to change.
annotate(**named_exprs) (and annotate_rows and annotate_cols)
An attempt to annotate over a key field will cause an error. If you actually want to annotate over a key field, use key_by directly (if preserving the field as a key), or specify a new key and then call annotate.
select(*exprs, **named_exprs) (and select_rows and select_cols)
likewise deals only with value fields. All key fields are automatically preserved; if locus and alleles are already row keys in a MatrixTable, you can do mt.select_rows() to drop all the other fields. If you want to
drop(*fields)
In order to drop a field, remove it from the key first with key_by.
transmute(**named_exprs)
Restrictions on what can be overwritten with transmute are identical to the restrictions on annotate. The main change is that referencing a key field is allowed, but will not cause the field to be dropped. Value fields will continue to be dropped.