Hey there. We are trying to force the locus to use GRCh38, but when plotting an error is raised about GRCh37.
script
import hail as hl
hl.init()
hl.get_reference(name='GRCh38')
# ^ just in case this is treated like an environment variable
ctbl = hl.import_table(
'ctbl_export.tsv'
, types = {
'P': hl.tfloat32
, 'locus': hl.tlocus(reference_genome='GRCh38')
}
)
hl.plot.manhattan(ctbl.P, ctbl.locus)
error
Hail version: 0.2.16-6da0d3571629
/hail/utils/java.py", line 240, in deco
'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
Error summary: HailException: Invalid locus 'chr1:124478211' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
2019-07-08 17:32:27 Hail: INFO: Reading table with no type imputation
Loading column 'CHROM' as type 'str' (type not specified)
Loading column 'POS' as type 'str' (type not specified)
Loading column 'ID' as type 'str' (type not specified)
Loading column 'REF' as type 'str' (type not specified)
Loading column 'ALT' as type 'str' (type not specified)
Loading column 'A1' as type 'str' (type not specified)
Loading column 'TEST' as type 'str' (type not specified)
Loading column 'OBS_CT' as type 'str' (type not specified)
Loading column 'OR' as type 'str' (type not specified)
Loading column 'SE' as type 'str' (type not specified)
Loading column 'Z_STAT' as type 'str' (type not specified)
Loading column 'P' as type 'float32' (user-specified)
Loading column 'PHENO' as type 'str' (type not specified)
Loading column 'GC' as type 'str' (type not specified)
Loading column 'QQ' as type 'str' (type not specified)
Loading column 'BONF' as type 'str' (type not specified)
Loading column 'info_score_freq' as type 'str' (type not specified)
Loading column 'info_score_col' as type 'str' (type not specified)
Loading column 'info_score_information' as type 'str' (type not specified)
Loading column 'info_score_hg19_chrom' as type 'str' (type not specified)
Loading column 'info_score_hg19_pos' as type 'str' (type not specified)
Loading column 'info_score_hg19_qStrand' as type 'str' (type not specified)
Loading column 'info_score_hg19_liftoverStatus' as type 'str' (type not specified)
Loading column 'gene_symbol' as type 'str' (type not specified)
Loading column 'VEP_Max_Impact' as type 'str' (type not specified)
Loading column 'VEP_max_consequence' as type 'str' (type not specified)
Loading column 'locus' as type 'locus<GRCh38>' (user-specified)
Traceback (most recent call last):
File "/var/folders/1z/0ky89yln5rx86068554kzgk00000gn/T/Rtmp07GDXE/chunk-code-62bf5a694648.txt", line 14, in <module>
hl.plot.manhattan(ctbl.P, ctbl.locus)
File "</Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/decorator.py:decorator-gen-1362>", line 2, in manhattan
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/plot/plots.py", line 1378, in manhattan
contig_ticks = hail.eval([hail.locus(contig, int(ref.lengths[contig]/2)).global_position() for contig in observed_contigs])
File "</Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/decorator.py:decorator-gen-514>", line 2, in eval
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/expr/expressions/expression_utils.py", line 190, in eval
return eval_timed(expression)[0]
File "</Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/decorator.py:decorator-gen-512>", line 2, in eval_timed
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/expr/expressions/expression_utils.py", line 156, in eval_timed
return Env.backend().execute(expression._ir, True)
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/backend/backend.py", line 108, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/Users/user/Desktop/py3_jupyter_rconnect/lib/python3.7/site-packages/hail/utils/java.py", line 240, in deco
'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
hail.utils.java.FatalError: HailException: Invalid locus 'chr1:124478211' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
Java stack trace:
is.hail.utils.HailException: Invalid locus 'chr1:124478211' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
at is.hail.utils.package$.fatal(package.scala:75)
at is.hail.variant.ReferenceGenome.checkLocus(ReferenceGenome.scala:249)
at is.hail.codegen.generated.C7.method3(Unknown Source)
at is.hail.codegen.generated.C7.method1(Unknown Source)
at is.hail.codegen.generated.C7.apply(Unknown Source)
at is.hail.codegen.generated.C7.apply(Unknown Source)
at is.hail.expr.ir.CompileAndEvaluate$$anonfun$12$$anonfun$apply$2.apply(CompileAndEvaluate.scala:99)
at is.hail.expr.ir.CompileAndEvaluate$$anonfun$12$$anonfun$apply$2.apply(CompileAndEvaluate.scala:85)
at is.hail.utils.package$.using(package.scala:597)
at is.hail.annotations.Region$.scoped(Region.scala:11)
at is.hail.expr.ir.CompileAndEvaluate$$anonfun$12.apply(CompileAndEvaluate.scala:85)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:20)
at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:84)
at is.hail.backend.Backend.execute(Backend.scala:86)
at is.hail.backend.Backend.executeJSON(Backend.scala:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Hi all I am trying to load my vcf file using this command hl.import_vcf(myvcf).write(‘hailvcf.mt’,overwrite=True)
And i get the error below Error summary: HailException: Invalid locus ‘chr12:97324738’ found. Contig ‘chr12’ is not in the reference genome ‘GRCh37’.
In GRCh37, contigs are named 1, 2, …, 22, X, and Y. In GRCh38, the contigs are named chr1, chr2, … chrX, chrY. By default import_vcf assumes you have GRCh37 data. If your data is encoded in GRCh38, you should specify that in import_vcf using the reference_genome parameter.
If your data is encoded in GRCh37 but erroneously has the chr prefix, you can remove it using contig_recoding, for example:
@danking. Thanks for help. I followed the instruction you gave me earlier and now this is the error I get. Error summary: HailException: Invalid locus ‘chr6_apd_hap1:838122’ found. Contig ‘chr6_apd_hap1’ is not in the reference genome ‘GRCh37’.
Do I need to replace all occurrences of lets say ‘chr6’ with ‘6’ ?
These are hg19 “alternative contigs”. Hail doesn’t support these contigs. Most current Hail users do not use these contigs for association analysis. You can explicitly remove them with a regular expression filter argument (e.g. filter="chr6_apd_hap1|chr6_cox_hap2|..."). You can also remove all invalid loci with the skip_invalid_loci=True. If you use skip_invalid_loci=True, you should verify that your dataset contains all the contigs you expect. There are many ways to explore this, I recommend starting with:
Hi, what about the error of “FatalError: HailException: Invalid locus ‘23:205383’ found. Contig ‘23’ is not in the reference genome ‘GRCh37’” ? Do you know how can I solve this ?
is 23 supposed to refer to the X chromosome? The reference genome GRCh37 uses X for that chromosome, not 23. You can probably use the contig_recoding argument on import_vcf to fix this: contig_recoding={'23': 'X', '24' : 'Y', '25': 'MT'} or something.
After doing what you wrote, I got now the error Error summary: HailException: Invalid locus ‘26:3396’ found. Contig ‘26’ is not in the reference genome ‘GRCh37’..
Because I got an error when applying LD pruning in hail, I applied pruning in plink, put the plink output to hwe_normalized_pca() and then encountered with this problem. I guess plink distorted the file format and that’s why I encountered with this problem ?
I think this is a data input problem. Whatever file you started with, does it contain a contig named 26? If so, you should talk to whomever gave you that data and ask them what contig 26 means.