Annotate_entries usage

Hi!

I have an array of float numbers with each entry corresponding to an entry in matrixTable. I would like to annotate each entry with a value from the array. Here is an example:
array = [1, 1, 1]
mt = mt.annotate_entries(featureName = array)

length of the array corresponds to the number of entries in matrixTable.

as a result I am getting that each entry is annotated with the entire array rather than a single entry getting a single value from the array:

locus | alleles | s |
±--------------±-----------±----------+
| locus | array | str |
±--------------±-----------±----------+
| 1:909917 | [“G”,“A”] | “HG00733” |
| 1:909917 | [“G”,“A”] | “HG01874” |
| 1:909917 | [“G”,“A”] | “HG01970” |
| 1:909917 | [“G”,“A”] | “HG02250” |
| 1:909917 | [“G”,“A”] | “HG02373” |
±--------------±-----------±----------+

±---------------------------------------------------------------------------------------+
| featureName |
±---------------------------------------------------------------------------------------+
| array |
±---------------------------------------------------------------------------------------+
| [1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1… |
| [1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1… |
| [1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1… |
| [1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1… |
| [1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1… |
±---------------------------------------------------------------------------------------+

Could you please suggest how can I get the first matrixTable entry to get annotated with the first element of the array, second entry with the second element and so on.

Thanks!
Nikita

First, a note which I think you know, but others reading this question later may not – this won’t scale to large matrix tables, since the array is in memory on a single computer.

In order to annotate entries with an element from the array, you need to do something with the following pattern:

array = ...
lit = hl.literal(array)
mt = mt.annotate_entries(featureName = lit[<index expression here>)

The question is what to put in the index expression, and that depends on whether your array is column- or row-major. We can first add row and col indices:

mt = mt.add_row_index().add_col_index()

# column major
n_cols = mt.count_cols()
mt = mt.annotate_entries(featureName = lit[(mt.row_idx * n_cols) + mt.col_idx)

# row major
n_rows = mt.count_rows()
mt = mt.annotate_entries(featureName = lit[(n_rows * mt.col_idx) + mt.row_idx)

1 Like

Thanks, Tim!

Ok, then one more related question with respect to the note, you mentioned.
What would be the best way to annotate each entry in matrix table with an output of some function, using as an input, some entry properties (e.g. AD). So, for example I need to evaluate function taking as input allelic depth for each entry and annotate each entry with the result.

In my above post, you can treat array as a result of application of a function to each entry’ AD (or other field).
Could you please suggest the most efficient way of implementing this?

Thanks!

Hail annotate functions are declarative ways to describe a transformation to be applied in parallel to all records. As an example, I’ll say that the function takes the AD and returns the fraction of reads coming from the reference allele:

def frac_ref_reads(ad):
    return ad[0] / hl.sum(ad)

mt = mt.annotate_entries(frac_ref = frac_ref_reads(mt.AD))