Hail 0.2 export gwas results to tsv

I’m trying to export the gwas results to a tsv, via a table, but I’m not able to shake the structure format. This is how I do it:

results = gwas.rows()
results.export('file:////root/logreg_wald.tsv.bgz')

This works, but the resulting file requires a lot of parsing:

locus alleles rsid cm_position logreg
1:768448 [“G”,“A”] rs12562034 0.0000e+00 {“beta”:0.24564585404573575,“standard_error”:0.09766742410866683,“z_stat”:2.515125757513836,“p_value”:0.011898993159370147,“fit”:{“n_iterations”:4,“converged”:true,“exploded”:false}}

The explode() method does not apply here (and does not apply to structures). Is there a quick way to save the results as different columns? (logreg.beta etc would be just fine)

Table.flatten is what you’re looking for here!

you can also do something like

results = results.select(**results.logreg)

in between the two lines you have there

That makes perfect sense, thank you. I was looking for a flatten option in rows and export, but didn’t think to look if a table.flatten existed.

One more thing though, I don’t understand the two stars in results.select(**results.logreg), what do they stand for? I saw this for ds.annotate() also.

It’s unpacking the struct results.logreg into a top-level list of keyword pairs (field name and expression):
http://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/

1 Like

it’s usually a good idea to avoid the ** in annotate unless you’re sure you want to be doing that, as it can do things like overwrite existing fields. Generally annotate_cols(phenos = pheno_table[mt.s]) is safer than annotate_cols(**pheno_table[mt.s]), but will require you to access fields with mt.phenos.pheno1 rather than me.pheno1

Great, thank you both, very helpful :slight_smile: