Using Genotype Probabilities in logreg from import_vcf


#1

Hi,

I can’t see how to use the genotype probabilities, instead of the hard call, when importing a vcf.

ds = hl.import_vcf('22.vcf.bgz', call_fields=["GP"], skip_invalid_loci=True)

Hail version: 0.2-3b08196a75cb
Error summary: HailException: Can only convert a header line with type String' to a call type. FoundFloat’.

ds.describe()

Global fields:
None
Column fields:
‘s’: str
Row fields:
‘locus’: locus
‘alleles’: array
‘rsid’: str
‘qual’: float64
‘filters’: set
‘info’: struct {
AC: array,
AN: int32,
RefPanelAF: array,
TYPED: bool,
INFO: float64
}
Entry fields:
‘GT’: call
‘ADS’: array
‘DS’: float64
‘GP’: array
Column key: [‘s’]
Row key: [‘locus’, ‘alleles’]

I couldn’t find anything about using genotype probabilities in Hail 0.2, help please.

Stephane


#2

Hail treats call fields (GT, PGT) specially by assigning them the call type. GP is just of type array<float>. Depending on what you want to do downstream, there are functions that can help manipulate it, like computing the dosage:

https://hail.is/docs/devel/functions/genetics.html?highlight=gp_dosage#hail.expr.functions.gp_dosage


#3

That does sound like what I’m looking for.
How do I include the gp dosage in logreg, something like this?

hl.logistic_regression_rows(test='wald',y=ds.is_case,x=hl.gp_dosage(ds.GP),covariates=[1, ds.is_female])


#4

yes, that looks good