I get a NegativeArraySizeException when loading a PLINK file

Hi! I get a NegativeArraySizeException when I load a PLINK file.

import hail as hl
mt = hl.import_plink('/path/to/plink-file')
mt.count_rows()

Some of the stack trace looks like this:

        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:427)
        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:227)
        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:228)
        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
        at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)

How can I fix this?

This is caused by an issue with two libraries we use Spark and Kryo, we’re working on a longterm fix. For now, you can try starting your spark cluster with an extra properties argument:

--properties 'spark:spark.executor.extraJavaOptions=-XX:hashCode=0,spark:spark.driver.extraJavaOptions=-XX:hashCode=0;

If you are running locally, you can try:

export PYSPARK_SUBMIT_ARGS="--driver-java-options '-XX:hashCode=0' --conf 'spark.executor.extraJavaOptions=-XX:hashCode=0' pyspark-shell"
ipython # or jupyter notebook