Importing XY psuedoautosomal data into hail


#1

What is the currently recommended method for importing pseudoautosomal XY data into hail? I am using the UKBB imputed data in bgen format for the XY chromosome. I note that if you try to load the data as XY, it errors with:

Hail version: 0.2-961f76d14f1e
Error summary: HailException: Invalid locus XY:60014' found. ContigXY’ is not in the reference genome `GRCh37’.

If you translate the bgen so that XY is just coded as X instead, it errors with:

Hail version: 0.2-961f76d14f1e
Error summary: HailException: Hail only supports diploid genotypes. Found min ploidy 1' and max ploidy2’.

If you translate the bgen so that XY is coded as chr 1 just to see what happens, it errors with:

Hail version: 0.2-961f76d14f1e
Error summary: HailException: Hail only supports 8-bit probabilities, found 16.

I am guessing that there are some specific genotypes in the XY file that it doesn’t like, but I’m not sure what I am looking for.


#2

Which BGEN file is this coming from?

We designed import_bgen to work with the v2 release of UKBB. If there is a newer release, we should support that.


#3

ah, this seems to be v3: ukb_imp_chrXY_v3.bgen (https://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=1125). note the first variant is shown as chr PAR, then it switches to chr XY after that… but regardless of how I re-code the chr it won’t load.