I am trying to load a hail table from a simple Pandas Dataframe that contains a column that contains dates in the np.datetime64[ns] datatype, but it gives me a sparkql error.
FatalError: MatchError: TimestampType (of class org.apache.spark.sql.types.TimestampType$)
import hail as hl
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'EID': ['1', '2','3'], 'VISIT': ['2008-11-06','2008-09-09','2008-11-25'], 'BMI':[27,34,26]})
df.dtypes
EID object
VISIT object
BMI int64
dtype: object
df['VISIT'] = pd.to_datetime(df.VISIT)
df.dtypes
EID object
VISIT datetime64[ns]
BMI int64
dtype: object
ht = hl.Table.from_pandas(df,key = 'EID')
ht.show()
FatalError: MatchError: TimestampType (of class org.apache.spark.sql.types.TimestampType$)
Java stack trace:
scala.MatchError: TimestampType (of class org.apache.spark.sql.types.TimestampType$)
at is.hail.expr.SparkAnnotationImpex$.importType(AnnotationImpex.scala:29)
at is.hail.expr.SparkAnnotationImpex$$anonfun$importType$1.apply(AnnotationImpex.scala:39)
at is.hail.expr.SparkAnnotationImpex$$anonfun$importType$1.apply(AnnotationImpex.scala:39)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at is.hail.expr.SparkAnnotationImpex$.importType(AnnotationImpex.scala:39)
at is.hail.backend.spark.SparkBackend.pyFromDF(SparkBackend.scala:403)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)Hail version: 0.2.57-582b2e31b8bd
Error summary: MatchError: TimestampType (of class org.apache.spark.sql.types.TimestampType$)
Any ideas how to convert the dates to a format that HAIL can understand or do I have to keep it in STRING form?