I would like to group some columns of a table into a struct for better organization…
I have this
phenotypes = ['Diabetes']
covariates = ['BMI', 'SEX', 'Ethnicity']
And want to go from this
ht.show()
to this:
hta.show()
I know how to do this MANUALLY, like this:
hta = ht.annotate(phenotypes = hl.struct(Diabetes = ht.Diabetes), covariates = hl.struct(BMI = ht.BMI, Ethnicity = ht.Ethnicity, SEX = ht.SEX))
hta = hta.drop('Diabetes','BMI','Ethnicity','SEX')
hta.show()
But I feel there is an easier and more automatable way (where I can use the name of the list as the annotation and the columns from the ht in the list as the values…
Thanks!
Yep! The trick is do the following.
cov_dict = { name : ht[name] for name in covariates}
pheno_dict = { name : ht[name] for name in pheno}
hta = ht.annotate( covariates = hl.struct(**cov_dict), phenotypes = hl.struct(**pheno_dict))
You can also avoid the drop
if you do transmute
instead of annotate
.
Excellent! Is transmute “cheaper” to run, or is hail simply doing a drop itself and is it just a convenience function? I can imagine that a user would like to “change their mind” about phenotypes vs covariates for GWAS etc. so it may happen more than once, so if it is much more efficient to do transmute, that would be great to know.
In any case, works like a charm!
Totally convenience. transmute is implemented as annotate/drop.