I’m trying to pull entries data from hail to build predictive model with spark.ml. I came up with the following codes:
mt = hl.read_matrix_table(tmp) mt = mt.unfilter_entries() mt = mt.annotate_cols(g = hl.agg.collect(mt.GT.n_alt_alleles())) test = mt.cols().select('g').to_spark()
and this is what I got:
The problem of this snippet is that hl.agg.collect() doesn’t guarantee the order of the array, which makes the feature tracking hard. Anyone has a solution for this? Would hl.str(mt.locus).collect() work?
Thanks a lot!