I am trying to run a GWAS in All of Us and want to use genetic sex. However, when I use imputed_sex, from my understanding it is a boolean of is_female where female is True and male is False. To my understanding, for a GWAS everything needs to be either 0 or 1 and I cannot figure out how to recode this to be 0 or 1 within the hail environment. Thanks for any help
Hi @hmseagle! You can use the annotate method to add new fields to a hail table or the transmute method to overwrite an existing field. Note that for matrix tables, there are instead annotate_rows
, annotate_cols
and annotate_entries
methods (and the same for transmute). Here’s an example using transmute
to change the representation of a field:
In [3]: t = hl.utils.range_table(10)
...: t = t.annotate(sex=t.idx % 2 == 0)
...: t.show()
+-------+-------+
| idx | sex |
+-------+-------+
| int32 | bool |
+-------+-------+
| 0 | True |
| 1 | False |
| 2 | True |
| 3 | False |
| 4 | True |
| 5 | False |
| 6 | True |
| 7 | False |
| 8 | True |
| 9 | False |
+-------+-------+
In [4]: t = t.transmute(sex=hl.int(t.sex))
...: t.show()
+-------+-------+
| idx | sex |
+-------+-------+
| int32 | int32 |
+-------+-------+
| 0 | 1 |
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 1 |
| 5 | 0 |
| 6 | 1 |
| 7 | 0 |
| 8 | 1 |
| 9 | 0 |
+-------+-------+
@danielgoldstein Thanks so much for this! Would you mind if I ask you a specific question with my code? I am new to coding and especially Python. This is what I have done so far:
mt = mt.annotate_cols(pheno = phenotypes[mt.s]) #this is my matrix table that has phenotype and genotype information
imputed_sex = hl.impute_sex(mt.GT)
mt = mt.annotate_cols(imputed_sex = imputed_sex[mt.s])
In this step below, I am trying to convert the ‘True’ in the is_female section of the imputed_sex structure to 0 and “False” to 1. From my understanding, Python automatically changes True to 1 and False to 0, but the ~ will flip this, which is what I need. However, I get an error saying “MatrixTable instance has no field, method, or property ‘transmute’” I am not sure which function would be appropriate for a specific part of a structure. Thanks so much! :
mt = mt.transmute(imputed_sex=hl.int(~mt.imputed_sex.is_female))
Good question. In this case, there is no transmute
method because you need to specify the “axis” that you are operating on (while for Table
s there is only one axis, the vector of rows). Since you created imputed_sex
with mt.annotate_cols
, it is what we refer to as a “column field”. In order to transmute that field you need to do mt.transmute_cols
. I believe what you have there should work.
You can also avoid transmute_cols
by doing the negation and cast to int
inside the annotate_cols
, but that is a more subjective matter of readability. The output and performance should be identical.