Hi All
Im using hail V2 0.2.10-a4870bf102a8 , Spark 2.3.0
Running on Apache Spark version 2.3.0.cloudera4
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.10-a4870bf102a8
What I am struggling with is trying to run Spark SQL on data from hail which I have converted to spark e.g.
sqc = SQLContext(sc)
final_vds=hl.import_vcf(vcf)
df=final_vds.rows().to_spark(flatten=True)
df.createOrReplaceTempView("mytable")
dss=sqc.sql("select * from mytable")
This does not seem to work as it complains that the view/table is not available. I can do this with an ordinary spark DataFrame which I produce from a spark ingest workflow, but not from Hail.
Should I be saving this back to HDFS then reading it back in as a spark DF and then doing SQL or is there a better way?
Thanks