StackOverflow with concordance function


#1

I’ve been experimenting with the concordance function, and am stuck. The set up here is comparing a set of NA12878 genotypes, isolating chr22 before the comparisons.

I have been able to use concordance to successfully compare:

  1. a single sample against itself
  2. two single samples against another 10 (2 total comparisons)
  3. ten single samples against another 10 (10 total comparisons)

But, I tried then to compare 100 to another 100, and this fails with a stack overflow.
my command:
hundred_sum, hundred_samples, hundred_variants = hl.concordance(hundred, na12878_mt_biallelic_vars_chr22)
It begins by identifying the 100 matched samples:

2018-11-28 18:32:55 Hail: INFO: Found 100 overlapping samples
  Left: 100 total samples
  Right: 2437 total samples

But then fails:

	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2678)
	at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:3179)
	at java.io.ObjectInputStream.readHandle(ObjectInputStream.java:1683) 
       ...
	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2177)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2068)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1572)

Hail version: 0.2-961f76d14f1e
Error summary: StackOverflowError: null

Any suggestions for what might be causing the error?


#2

is there anything else in the stack trace or just miles of repeat of the pasted bit?


#3

I’ve done a bit more investigation, and I think that the problem with concordance I was observing is actually a different issue related to filesystem read/write (which I will post a question about separately in Errors after iterative calls to union_cols).

I have been able to complete a concordance of 100 samples just fine with a different mt file to both itself and a sample_id-shuffled version of itself.