Hi Team,
Sorry if I am using a wrong thread to post an issue with lift over. Let me know if I shall raise a new entry for it but would need your advise for below scenario:
I was trying to use hail lift over on a specific dataset for GRCH38 to GRCH37 conversion. I am getting some null pointer exceptions.
Below is the code I am using:
rg37 = hl.get_reference(‘GRCh37’)
rg38 = hl.get_reference(‘GRCh38’)
rg38.add_liftover('gs://hail-common/references/grch38_to_grch37.over.chain.gz, rg37)
ht = ht.annotate(new_pos=hl.liftover(hl.locus(ht.chr,ht.pos, ‘GRCh38’), ‘GRCh37’))
Error Details:
TaskSetManager: WARN: Lost task 4.0 in stage 0.0 (TID 22, , executor 8): java.lang.NullPointerException
at is.hail.codegen.generated.C0.method1(Unknown Source)
at is.hail.codegen.generated.C0.apply(Unknown Source)
at is.hail.codegen.generated.C0.apply(Unknown Source)
at is.hail.expr.TableMapRows$$anonfun$65$$anonfun$apply$32.apply(Relational.scala:2080)
at is.hail.expr.TableMapRows$$anonfun$65$$anonfun$apply$32.apply(Relational.scala:2079)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:121)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:112)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
I tried this on a different dataset and it works so wondering if it could be a data issue. Is there a way to get some more logging on it?
Thanks in advance,
Aditya Pandit