I’m attempting to manipulate columns from the first occurrences table from the UK Biobank in hail. The table is a collection of columns (length n= number of patients) with a three letter ICD code header. Each row value is either an empty string (“”) or the date of first occurrence (eg, 2002-10-1) of the three letter ICD code from each participant’s medical data. There is also a column of patient identifiers (to match rows with patients).
I am hoping to replace the dates in each row with the ICD code header for the column (and then gather the defined values for each patient). This seems to require iterating over each column, and I can’t figure out how to accomplish this using a hail table/ hail syntax.
This blog entry asks and answers a similar question using python: https://stackoverflow.com/questions/37032043/how-to-replace-a-value-in-a-pandas-dataframe-with-column-name-based-on-a-conditi
Some things I’ve tried (with table described above loaded at fo_table):
Option #1: (ExpressionException: Cannot index with a scalar expression)
#grab list of columns keys
fo_table_cols= dict(fo_table.row_value).keys()
fo_table2=fo_table.annotate(icd=hl.set(hl.array(list(fo_table_cols)).map(lambda x: hl.if_else(hl.is_defined(fo_table[x]), x, hl.null(hl.tstr)))))
Option #2: (ExpressionException: Hail cannot automatically impute type of <class ‘collections.abc.KeysView’>)
fo_table2=fo_table.annotate(icds_to_keep=hl.array(fo_table.row.keys()).map(lambda k: hl.if_else(fo_table.row()[k]!="", k, “”)))
Thanks in advance,
Kelly