Concat two array<float64> fields

I generated 2 fields in my MatrixTable. One is an array with N elements, another - just float64 field. I want to concatenate the 2nd one to the end of the first one, so, for instance, if the first field for the 1st row (locus) looks like:

[0, 0, 1, 0]

and the 2nd looks like:

0.2

The end result should be:

[0, 0, 1, 0, 0.2]

I tried converting the float64 field into array, then merging them in an annotate_rows statement but its just not working since its trying to zip arrays when I have + and I don’t see any hl.concat() function for that purpose. Here is the code that I have:

pos_min = mt.aggregate_rows(hl.agg.min(mt.pos))
pos_max = mt.aggregate_rows(hl.agg.max(mt.pos))

mt = mt.annotate_rows(
    normalized_pos=(mt.pos - pos_min) / (pos_max - pos_min)
)

mt = mt.annotate_rows(normalized_pos_array=[mt.normalized_pos])
# Concatenating two fields - normalized_pos which is float64 and chromosome_one_hot which is array<float64>
mt_concat = mt.annotate_rows(chromosome_one_hot=mt.chromosome_one_hot + mt.normalized_pos_array)

The last line is wrong of course and that’s what I am trying to figure out.

Hi @Nik,

The function you’re looking for is extend, used like mt.chromosome_one_hot.extend(mt.normalized_pos_array). But even simpler is mt.chromosome_one_hot.append(mt.normalized_pos).

1 Like

Does not seem to be working. The length of the initial array that I have is 23 but after append() is run it stays 23:

mt.annotate_rows(chromosome_one_hot=mt.chromosome_one_hot.append(mt.normalized_pos))
res = mt.head(1).chromosome_one_hot.collect()
len(res[0])

and extend() gives the same result.

Interestingly though if I run:

mt.annotate_rows(chromosome_one_hot_full = mt.chromosome_one_hot.append(mt.normalized_pos))
mt.chromosome_one_hot_full.show()

I can’t access chromosome_one_hot_full as if annotate_rows() function fails somehow…

If I just run

mt.chromosome_one_hot.append(mt.normalized_pos).collect()

It works but that’s not what I want: I need to do this operation within the MatrixTable field without any collect() methods.

Hail’s interface is “immutable”. In both cases, you create a new “recipe” for the Matrix Table but then ignore it. Try this:

mt = mt.annotate_rows(
    chromosome_one_hot = mt.chromosome_one_hot.append(
        mt.normalized_pos
    )
)
res = mt.head(1).chromosome_one_hot.collect()
len(res[0])

You can see more examples of this in the GWAS Tutorial, the Annotation How-To Guides, and the Genetics How-To Guides. The Overview is pretty detailed but might also be of interest.

1 Like