Locus erring about the wrong reference genome

Hey there. We are trying to force the locus to use GRCh38, but when plotting an error is raised about GRCh37.

script

import hail as hl
hl.init()

hl.get_reference(name='GRCh38')
# ^ just in case this is treated like an environment variable

ctbl = hl.import_table(
    'ctbl_export.tsv'
    , types = {
        'P': hl.tfloat32
        , 'locus': hl.tlocus(reference_genome='GRCh38')
    }
)

hl.plot.manhattan(ctbl.P, ctbl.locus)

error

Hail version: 0.2.16-6da0d3571629

/hail/utils/java.py", line 240, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None

Error summary: HailException: Invalid locus 'chr1:124478211' found. Contig 'chr1' is not in the reference genome 'GRCh37'.

what’s the full stacktrace?

2019-07-08 17:32:27 Hail: INFO: Reading table with no type imputation
  Loading column 'CHROM' as type 'str' (type not specified)
  Loading column 'POS' as type 'str' (type not specified)
  Loading column 'ID' as type 'str' (type not specified)
  Loading column 'REF' as type 'str' (type not specified)
  Loading column 'ALT' as type 'str' (type not specified)
  Loading column 'A1' as type 'str' (type not specified)
  Loading column 'TEST' as type 'str' (type not specified)
  Loading column 'OBS_CT' as type 'str' (type not specified)
  Loading column 'OR' as type 'str' (type not specified)
  Loading column 'SE' as type 'str' (type not specified)
  Loading column 'Z_STAT' as type 'str' (type not specified)
  Loading column 'P' as type 'float32' (user-specified)
  Loading column 'PHENO' as type 'str' (type not specified)
  Loading column 'GC' as type 'str' (type not specified)
  Loading column 'QQ' as type 'str' (type not specified)
  Loading column 'BONF' as type 'str' (type not specified)
  Loading column 'info_score_freq' as type 'str' (type not specified)
  Loading column 'info_score_col' as type 'str' (type not specified)
  Loading column 'info_score_information' as type 'str' (type not specified)
  Loading column 'info_score_hg19_chrom' as type 'str' (type not specified)
  Loading column 'info_score_hg19_pos' as type 'str' (type not specified)
  Loading column 'info_score_hg19_qStrand' as type 'str' (type not specified)
  Loading column 'info_score_hg19_liftoverStatus' as type 'str' (type not specified)
  Loading column 'gene_symbol' as type 'str' (type not specified)
  Loading column 'VEP_Max_Impact' as type 'str' (type not specified)
  Loading column 'VEP_max_consequence' as type 'str' (type not specified)
  Loading column 'locus' as type 'locus<GRCh38>' (user-specified)

Traceback (most recent call last):
  File "/var/folders/1z/0ky89yln5rx86068554kzgk00000gn/T/Rtmp07GDXE/chunk-code-62bf5a694648.txt", line 14, in <module>
    hl.plot.manhattan(ctbl.P, ctbl.locus)
  File "</Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/decorator.py:decorator-gen-1362>", line 2, in manhattan
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/plot/plots.py", line 1378, in manhattan
    contig_ticks = hail.eval([hail.locus(contig, int(ref.lengths[contig]/2)).global_position() for contig in observed_contigs])
  File "</Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/decorator.py:decorator-gen-514>", line 2, in eval
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/expr/expressions/expression_utils.py", line 190, in eval
    return eval_timed(expression)[0]
  File "</Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/decorator.py:decorator-gen-512>", line 2, in eval_timed
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/expr/expressions/expression_utils.py", line 156, in eval_timed
    return Env.backend().execute(expression._ir, True)
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/backend/backend.py", line 108, in execute
    result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/utils/java.py", line 240, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
hail.utils.java.FatalError: HailException: Invalid locus 'chr1:124478211' found. Contig 'chr1' is not in the reference genome 'GRCh37'.

Java stack trace:
is.hail.utils.HailException: Invalid locus 'chr1:124478211' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
	at is.hail.utils.package$.fatal(package.scala:75)
	at is.hail.variant.ReferenceGenome.checkLocus(ReferenceGenome.scala:249)
	at is.hail.codegen.generated.C7.method3(Unknown Source)
	at is.hail.codegen.generated.C7.method1(Unknown Source)
	at is.hail.codegen.generated.C7.apply(Unknown Source)
	at is.hail.codegen.generated.C7.apply(Unknown Source)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$12$$anonfun$apply$2.apply(CompileAndEvaluate.scala:99)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$12$$anonfun$apply$2.apply(CompileAndEvaluate.scala:85)
	at is.hail.utils.package$.using(package.scala:597)
	at is.hail.annotations.Region$.scoped(Region.scala:11)
	at is.hail.expr.ir.CompileAndEvaluate$$anonfun$12.apply(CompileAndEvaluate.scala:85)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:20)
	at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:84)
	at is.hail.backend.Backend.execute(Backend.scala:86)
	at is.hail.backend.Backend.executeJSON(Backend.scala:92)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

ok, this is definitely a bug in the manhattan function. Tracking issue here:

For now, you can fix this by changing the second line to:

hl.init(default_reference='GRCh38')

Thank you very much. Glad we helped you squash one.

Hi all I am trying to load my vcf file using this command hl.import_vcf(myvcf).write(‘hailvcf.mt’,overwrite=True)
And i get the error below
Error summary: HailException: Invalid locus ‘chr12:97324738’ found. Contig ‘chr12’ is not in the reference genome ‘GRCh37’.

In GRCh37, contigs are named 1, 2, …, 22, X, and Y. In GRCh38, the contigs are named chr1, chr2, … chrX, chrY. By default import_vcf assumes you have GRCh37 data. If your data is encoded in GRCh38, you should specify that in import_vcf using the reference_genome parameter.

If your data is encoded in GRCh37 but erroneously has the chr prefix, you can remove it using contig_recoding, for example:

import_vcf(...,
           contig_recoding={'chr1': '1', 'chr2': '2', ..., 'chrX': 'X', ...}

@danking. Thanks for help. I followed the instruction you gave me earlier and now this is the error I get.
Error summary: HailException: Invalid locus ‘chr6_apd_hap1:838122’ found. Contig ‘chr6_apd_hap1’ is not in the reference genome ‘GRCh37’.

Do I need to replace all occurrences of lets say ‘chr6’ with ‘6’ ?

These are hg19 “alternative contigs”. Hail doesn’t support these contigs. Most current Hail users do not use these contigs for association analysis. You can explicitly remove them with a regular expression filter argument (e.g. filter="chr6_apd_hap1|chr6_cox_hap2|..."). You can also remove all invalid loci with the skip_invalid_loci=True. If you use skip_invalid_loci=True, you should verify that your dataset contains all the contigs you expect. There are many ways to explore this, I recommend starting with:

mt.locus.summarize()

Hi, what about the error of “FatalError: HailException: Invalid locus ‘23:205383’ found. Contig ‘23’ is not in the reference genome ‘GRCh37’” ? Do you know how can I solve this ?

is 23 supposed to refer to the X chromosome? The reference genome GRCh37 uses X for that chromosome, not 23. You can probably use the contig_recoding argument on import_vcf to fix this: contig_recoding={'23': 'X', '24' : 'Y', '25': 'MT'} or something.

After doing what you wrote, I got now the error Error summary: HailException: Invalid locus ‘26:3396’ found. Contig ‘26’ is not in the reference genome ‘GRCh37’..

Because I got an error when applying LD pruning in hail, I applied pruning in plink, put the plink output to hwe_normalized_pca() and then encountered with this problem. I guess plink distorted the file format and that’s why I encountered with this problem ?

I think this is a data input problem. Whatever file you started with, does it contain a contig named 26? If so, you should talk to whomever gave you that data and ask them what contig 26 means.

If this is coming from plink, the recoding should be: {'23': 'X', '24' : 'Y', '25': 'X', '26': 'MT'} I think

Thanks Tim, that worked.