Convert posterior probabilities in .gen to dosages

phelkkula · December 12, 2017, 9:53pm

Hi,
After importing a gen how do I write genotype dosages to a file or pandas dataframe? I found the dosage method mentioned in the documentation, but I still don’t know how to go about doing this.

tpoterba · December 12, 2017, 10:09pm

Can you be a bit more specific about what you want to do with the dosages? Moving data from Hail / Spark into a pandas dataframe is very inefficient.

phelkkula · December 12, 2017, 10:59pm

Yes, I see. I would like to compare (e.g. by correlation) two sets of dosages with each another.

tpoterba · December 12, 2017, 11:05pm

Got it. This isn’t something you can do in 0.1 (definitely will be possible in the next version).

I think make_table can make the text file you want:

# vds is the dataset from the imported .gen
table = vds.make_table('v = v', ['`` = g.dosage']) # make a table with one col per sample
table.export('file.tsv')

Then you can read it in with pandas / anything else.

phelkkula · December 13, 2017, 11:32am

Excellent, thank you

phelkkula · December 19, 2017, 12:28pm

I noticed that the genotype to dosage conversion is erroneous in some cases. For example, a genotype listed in the gen file:

22 --- rs16981741 17309881 A G 0.999982 1.78e-05 0

is converted to the following dosage:

     chr         pos  ref  altAlleles         1
0     22    17309881    A    [(A, G)]  0.000031

tpoterba · December 19, 2017, 3:45pm

Hail’s numerical precision for genotype probabilities in 0.1 is 1 / 2**15. This value is getting rounded to the smallest possible number:

In [22]: 1.0 / (2 ** 15)
Out[22]: 3.0517578125e-05

Topic		Replies	Views
GP to DS conversion Help [0.1]	4	767	June 19, 2018
Annotating samples with a specific genotype dosage Help [0.1]	7	974	November 17, 2017
SNP dosages to numpy/pandas? Hail Query & hailctl	23	1659	September 5, 2022
Using Genotype Probabilities in logreg from import_vcf Hail Query & hailctl	3	464	October 24, 2018
Genotype matrix in hail 0.2 Hail Query & hailctl	5	748	April 15, 2019

Convert posterior probabilities in .gen to dosages

Related topics