thank. I guess that I will have to finish the export data soon
I was running the command to convert VDS to CSV (from this thread).An error is being fired when I am defining the columns that I want to export to CSV. I think that I need some king of definition import for the keyTable (v, va)…but I might be wrong. Could you please take a look?
Thanks,
eilalan
The script (I broke the command to multiple small commands to find out where the issues it - the line that fires the error is bolded):
from hail import *
import json
print(“hc”)
hc = HailContext()
#read the VDS
#convert the VDS to VCF
print(“vds”)
vds = hc.read(‘gs://data_gnomad_orielresearch/gnomad.exomes.r2.0.1.sites.Y.vds’)
print(“finished read - Read .vds files as variant dataset.”)
#########################
##VARIANTS ANNOTATION
######################
print(“running variants_keytable”)
ktv = vds.variants_keytable()
print (“ktv is Key table with variants and variants annotations. ktv from type KeyTable”)
print(“Add new columns computed from existing columns. Select a subset of columns.”)
ktv_select = ktv.annotate(‘js = json({v: v, va: va})’).select(‘js’)
print(“ktv_select from type: keyTable”)
print(“Converts this key table to a Spark DataFrame.”)
ktv_df = ktv_select.to_dataframe()
print(“ktv_df from type: pyspark.sql.DataFrame”)
print(“write ktv_df to csv”)
ktv_df.write.csv(“gs://data_gnomad_orielresearch/vdsToCsv_variants.csv”)
print(“gs://data_gnomad_orielresearch/vdsToCsv_variants.csv was written - check it”)
File “/tmp/47957ee3-bdc5-4add-955b-2a14062b1534/convertVDSToCSV.py”, line 32, in
ktv_select = ktv.annotate(‘js = json({v: v, va: va})’).select(‘js’)
File “”, line 2, in select
File “/home/ec2-user/BuildAgent/work/c38e75e72b769a7c/python/hail/java.py”, line 119, in handle_py4j
hail.java.FatalError: An error occurred while calling into JVM, probably due to invalid parameter types.
Java stack trace:
An error occurred while calling o66.select. Trace:
py4j.Py4JException: Method select([class java.lang.String, class java.util.ArrayList]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Hail version: devel-fff80b1
Error summary: An error occurred while calling into JVM, probably due to invalid parameter types.
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [47957ee3-bdc5-4add-955b-2a14062b1534] entered state [ERROR] while waiting for [DONE].
wm8af-056:scripts landkof$