Annotating samples with a specific genotype dosage

Stephane_Bourgeois · November 17, 2017, 12:21pm

Hi,

I’m trying to find the right way to annotate samples with a specific genotype dosage, to be used later as a genetic covariate in regressions, but can’t find the right way of doing it. At the moment I’m exploring something like this and then exporting it to a table to annotate the vds:

vds.filter_variants_list([Variant.parse(‘1:123456:G:A’)]).annotate_samples_expr(‘sa.gCOV1 = g.dosage()’)

Help please.

tpoterba · November 17, 2017, 12:40pm

g isn’t in scope in annotate_samples_expr, you’ll need to use gs:


vds2 = vds.filter_variants_list([Variant.parse('1:123456:G:A')])\
          .annotate_samples_expr('sa.gCOV1 = gs.collect()[0].dosage()')

vds = vds.annotate_samples_table(vds2.samples_table(), expr='va.gCOV1 = table.gCOV1')

I could be slightly wrong with syntax, but this model will definitely work.

Stephane_Bourgeois · November 17, 2017, 4:05pm

gs I see… didn’t think of it. Thanks a lot

Stephane_Bourgeois · November 17, 2017, 4:10pm

One more thing I’m just thinking of (sorry), would that work if the genotypes are probabilities and not hard calls?

tpoterba · November 17, 2017, 4:15pm

I don’t think this would work with anything other than BGEN data, actually, since we call .dosage() after collecting the genotype.

Stephane_Bourgeois · November 17, 2017, 4:39pm

Wouldn’t dosage be calculated on the fly in other cases? If not, is there a way in Hail to populate the dosage field using calculation from the hard calls?

In the next version of Hail, as it will be possible to import the GP field from vcf files, will this be stored in Hail as dosage or will you keep the actual probabilities? (which would make more sense, and that’s also true of probabilities in bgens for that matter).

tpoterba · November 17, 2017, 4:42pm

I think dosage() will be missing if called on a hard-call. This probably isn’t ideal!

In 0.2, the dependence on the 5 standard GATK fields (GT, AD, DP, GQ, PL) goes away. It will be super easy to use the GP field, which will be kept as the actual probabilities until you explicitly transform it to allele dosage.

Stephane_Bourgeois · November 17, 2017, 4:50pm

That’s great, thanks.

Topic		Replies	Views
Convert posterior probabilities in .gen to dosages Help [0.1]	6	1018	December 19, 2017
Using Genotype Probabilities in logreg from import_vcf Hail Query & hailctl	3	464	October 24, 2018
GWAS with imputed data Hail Query & hailctl	11	990	March 10, 2020
Error when trying to annotate a new row with a genotypes of the sample Hail Query & hailctl	2	330	July 13, 2023
Generic genotype schemas in Hail Updates	0	1099	April 7, 2017

Annotating samples with a specific genotype dosage

Related topics