UKBiobank chromosome XY


#21

This looks like two related bugs we’ve fixed:

Try an update?


#22

OK, every time I update things go wrong and I end up re-installing the whole thing, so I must be doing something wrong…

git clone https://github.com/hail-is/hail.git
cd hail/hail
./gradlew -Dspark.version=2.2.0 shadowJar archiveZip
cd …

That compiles, and then I copy hail over to where the previous version is located: cp -R hail /usr/local/
After that, it doesn’t work anymore. What am I missing?


#23

it doesn’t work anymore

Can you elaborate?


#24

I get back into python and it doesn’t find Hail.


#25

OK, nevermind, now it works!
Before I wasn’t using Conda, I think that was the issue.


#26

Or does it… I’m still getting the same error

After copying the new Hail over, I updated the conda environment as follow:
conda-env update -n hail -f $HAIL_HOME/python/hail/environment.yml

After attempting the logistic regression on X:

Hail version: 0.2.5-b9537d16564d
Error summary: AssertionError: assertion failed: is_female not in struct{__y: float64, __cov0: float64, __cov1: float64, __cov2: float64, __cov3: float64, __cov4: float64, __cov5: float64, __cov6: float64, __cov7: float64, __cov8: float64}


#27

ok, must be something different – can you give us the full stack trace and the pipeline that replicates it?


#28

Sure, here is the script, will send you the log:

import hail as hl
import hail.expr.aggregators as agg
hl.init()
from pprint import pprint
from bokeh.io import output_notebook, show, export_png
from bokeh.layouts import gridplot
from bokeh.models import Span

import os

ds = hl.read_matrix_table('/mnt/output/sb/V/M/imputed_genotypes/HRC.vcfs/HRT_QCed_annotated_final.mt')

rg = ds.locus.dtype.reference_genome

x_contigs = set(rg.x_contigs)

y_contigs = set(rg.y_contigs)

autosomes = [c for c in rg.contigs if c not in x_contigs and c not in y_contigs]

mt_auto = hl.filter_intervals(ds, [hl.parse_locus_interval(c, rg) for c in autosomes])

mt_x = hl.filter_intervals(ds, [hl.parse_locus_interval(c, rg) for c in x_contigs])

x_chr_var = hl.case().when((mt_x.is_female | mt_x.locus.in_x_par()), hl.gp_dosage(mt_x.GP)).default(hl.sum(mt_x.GP * [0, 2]))

gwas_x = hl.logistic_regression_rows(x=x_chr_var, y=mt_x.pheno_case, covariates=[1, mt_x.is_female, mt_x.age, mt_x.weight, mt_x.PC1, mt_x.PC2, mt_x.PC3,mt_x.PC4,mt_x.PC5], test='wald', pass_through=[mt_x.rsid, mt_x.variant_qc, mt_x.EA, mt_x.NEA, mt_x.EAF])

#29

ok, I’m pretty baffled.

Could you put the following above the last line and paste the output here (feel free to edit names of fields):

print(mt_x._jmir.typ().colType().parsableString())

#30

also print(mt_x.col.dtype)


#31

ah wait I might see an issue…


#32

can you also send the Python stack trace in an email?


#33

(or here)