Export plink data merged from .bim/.bed/.fam files into Excel or CSV format

I’m working first time with the genomic data.
First of all congratulations and thanks to the Hail team for coming up with the solution for big data in this field.

My question is, can I export merged plink data (from all three .bim/.bed/.fam files) in excel format or csv format?
Yes, I have already imported plink files (.bim/.bed/.fam) in Hail.


Exporting directly to excel is not an option, but exporting a delimited text file may be possible. Can you write a few lines of the file you’d like to produce?

what I did so far is:
vds = hc.import_plink(bed=“plinkdata/dataMales.bed”, bim=“plinkdata/dataMales.bim”, fam=“plinkdata/dataMales.fam”)


and the output is:
Samples: 5
Variants: 736991
Call Rate: 0.999938
Contigs: [‘12’, ‘8’, ‘19’, ‘4’, ‘15’, ‘11’, ‘9’, ‘22’, ‘13’, ‘16’, ‘5’, ‘10’, ‘21’, ‘6’, ‘1’, ‘17’, ‘14’, ‘20’, ‘2’, ‘18’, ‘7’, ‘3’]
Multiallelics: 0
SNPs: 736991
MNPs: 0
Insertions: 0
Deletions: 0
Complex Alleles: 0
Star Alleles: 0
Max Alleles: 2

But now I have to transpose my data so I’ll have 736991 features (SNPs) to apply ranking and feature selection with Random Forest most probably.
So I thought of exporting this data in some txt/csv file first instead of getting into it and lost forever (as I’m new to Python as well).
if not, then is there any other way around to deal with such thing?
Hope you understand my point.

and if I do:
it exports the data into same three plink files again. which I don’t want!

You’ll have to use another tool to transpose, but this should let you export to a text file:

table = vds.make_table('v = v', ['`` = g.gt'])

Thanks it actually worked! :slight_smile:

but can I transpose my (merged) data directly after importing plink binary files?

Hail does not currently support transposition of the variant-sample matrix.

What will you do with the transposed data? The answer to that question will help us understand the best way to help you.

Thanks Danking.
Well, I have to transpose my data to apply ranking and feature selection on SNPs and in this case I’ll have 700K+ SNPs.