Hi,
I’m trying to use the annotation database with the example 1kg VDS file, but keep getting errors like the below:
Job [c0475ca9-27ad-4e1a-833e-622e8ddeceef] submitted.
Waiting for job output...
Running on Apache Spark version 2.0.2
SparkUI available at http://10.132.0.2:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.1-0d9e264
2017-09-08 22:45:54 Hail: WARN: called redundant split on an already split VDS
Traceback (most recent call last):
File "/tmp/c0475ca9-27ad-4e1a-833e-622e8ddeceef/annot.py", line 10, in <module>
'va.gencode19'
File "/home/ec2-user/BuildAgent/work/4d93753832b3428a/python/hail/dataset.py", line 997, in annotate_variants_db
sqlite3.OperationalError: too many terms in compound SELECT
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [c0475ca9-27ad-4e1a-833e-622e8ddeceef] entered state [ERROR] while
waiting for [DONE].
I’ve tried with both this file as well as a sites only VDS generated from a fairly generic variant list. any advice much appreciated.
thanks
Ah yes, sorry not to have included. The annotations requested were quite simple, e.g.
sites_vds.annotate_variants_db([
‘va.cadd’,
‘va.fantom5’
])
The error about too many terms was pretty consistent across a few different attempts.
thanks!
Can you give some more details about how you’re creating your cluster, such as what type of machines and OSes you’re using? Is this a vanilla dataproc cluster?