https://github.com/hail-is/hail/pull/3297
This PR will change the way keys on Tables and MatrixTables are dealt with in certain functions. (The doc links will be updated once this pull request lands.)
key_by(*expr, **named_exprs)
(and key_rows_by
and key_cols_by
)
is the only method that can modify key fields. The interface is identical to the current select
interface, where non-named exprs must be field references (not necessarily top-level), but non-field-reference expressions can still be used if a name is provided. All unused former key expressions are retained as value fields.
partition_rows_by(partition_key, *exprs, **named_exprs)
lets you specify a partition key from the key fields in the new matrix table. It’s identical to key_rows_by
, but takes a list of key fields as its first arguement as a partition key. This interface is still subject to change.
annotate(**named_exprs)
(and annotate_rows
and annotate_cols
)
An attempt to annotate over a key field will cause an error. If you actually want to annotate over a key field, use key_by
directly (if preserving the field as a key), or specify a new key and then call annotate.
select(*exprs, **named_exprs)
(and select_rows
and select_cols
)
likewise deals only with value fields. All key fields are automatically preserved; if locus
and alleles
are already row keys in a MatrixTable
, you can do mt.select_rows()
to drop all the other fields. If you want to
drop(*fields)
In order to drop a field, remove it from the key first with key_by.
transmute(**named_exprs)
Restrictions on what can be overwritten with transmute
are identical to the restrictions on annotate
. The main change is that referencing a key field is allowed, but will not cause the field to be dropped. Value fields will continue to be dropped.