ClosedChannelException: null hail 0.2.56

Hi Hail team!

I am getting an error running the code below using Hail 0.2.56, but it was run without error in 0.2.34. Log attached.

import hail as hl

from gnomad.resources import MatrixTableResource
from gnomad.resources.grch38 import telomeres_and_centromeres
from gnomad.utils.sparse_mt import impute_sex_ploidy
from gnomad_qc.v3.resources.sample_qc import hard_filtered_samples
from gnomad_qc.v3.resources.meta import meta

hl.init(log='/hail.log', default_reference='GRCh38')


def get_gnomad_v3_mt(
        key_by_locus_and_alleles: bool = False,
) -> hl.MatrixTable:
    mt = gnomad_v3_genotypes.mt()
    if key_by_locus_and_alleles:
        mt = hl.MatrixTable(hl.ir.MatrixKeyRowsBy(mt._mir, ['locus', 'alleles'], is_sorted=True))
        
    return mt


# V3 genotype data
gnomad_v3_genotypes = MatrixTableResource("gs://gnomad/raw/hail-0.2/mt/genomes_v3/gnomad_genomes_v3.repartitioned.mt")


    
mt = get_gnomad_v3_mt()
renamed_1kg = hl.import_table('gs://gnomad-tmp/duplicate_1kg.txt').key_by('s')
mt = mt.filter_cols(hl.is_defined(renamed_1kg[mt.col_key]))
ht = impute_sex_ploidy(
        mt,
        excluded_calling_intervals=telomeres_and_centromeres.ht()
    )
ht = ht.checkpoint('gs://gnomad-tmp/sex_ploidy_duplicate_1kg.ht', overwrite=True)

Thank you in advance for your help!

hail.log (259.2 KB)

looks like the ClosedChannelException is just masking the real error in the log:

2020-09-03 16:52:09 TaskSetManager: WARN: Lost task 118.1 in stage 0.0 (TID 134, jg3-w-0.c.maclab-ukbb.internal, executor 1): htsjdk.samtools.SAMException: Unable to load chr20(59410137, 59414233) from /tmp/fasta-reader-FcVGkbAEQi5MCYD6x7hi1B.fasta
	at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSubsequenceAt(AbstractIndexedFastaSequenceFile.java:207)
	at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:49)
	at is.hail.io.reference.FASTAReader.getSequence(FASTAReader.scala:73)
	at is.hail.io.reference.FASTAReader.fillBlock(FASTAReader.scala:83)
	at is.hail.io.reference.FASTAReader.readBlock(FASTAReader.scala:93)
	at is.hail.io.reference.FASTAReader.readBlock(FASTAReader.scala:99)
	at is.hail.io.reference.FASTAReader.lookupGlobalPos(FASTAReader.scala:138)
	at is.hail.io.reference.FASTAReader.lookup(FASTAReader.scala:110)
	at is.hail.variant.ReferenceGenome.getSequence(ReferenceGenome.scala:357)
	at __C24Compiled.__m42getReferenceSequenceFromValidLocus(Unknown Source)
	at __C24Compiled.apply(Unknown Source)
	at is.hail.expr.ir.TableFilter$$anonfun$execute$2.apply(TableIR.scala:946)
	at is.hail.expr.ir.TableFilter$$anonfun$execute$2.apply(TableIR.scala:946)
	at is.hail.expr.ir.TableValue$$anonfun$3.apply(TableValue.scala:65)
	at is.hail.expr.ir.TableValue$$anonfun$3.apply(TableValue.scala:65)
	at is.hail.rvd.RVD$$anonfun$17$$anonfun$apply$2.apply$mcZJ$sp(RVD.scala:605)
	at is.hail.rvd.RVD$$anonfun$17$$anonfun$apply$2.apply(RVD.scala:604)
	at is.hail.rvd.RVD$$anonfun$17$$anonfun$apply$2.apply(RVD.scala:604)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:464)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at is.hail.rvd.RVDPartitionInfo$$anonfun$apply$1.apply(RVDPartitionInfo.scala:66)
	at is.hail.rvd.RVDPartitionInfo$$anonfun$apply$1.apply(RVDPartitionInfo.scala:38)
	at is.hail.utils.package$.using(package.scala:609)
	at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:38)
	at is.hail.rvd.RVD$$anonfun$32.apply(RVD.scala:1223)
	at is.hail.rvd.RVD$$anonfun$32.apply(RVD.scala:1221)
	at is.hail.sparkextras.ContextRDD$$anonfun$crunJobWithIndex$1.apply(ContextRDD.scala:232)
	at is.hail.sparkextras.ContextRDD$$anonfun$crunJobWithIndex$1.apply(ContextRDD.scala:230)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Thank you for finding the real error Tim! So does that mean there is an error loading the reference sequence? Do you have any suggestions for what I can change to fix it?

I’m not totally sure. It does look like all those errors came from the same fasta block read.

Something you could do to help debug is run the following to see if the error replicates:

chr20 = mt.filter_rows(mt.locus.contig == 'chr20').rows().select()
chr20.annotate(context=chr20.locus.sequence_context())._force_count()

With that I get this error TypeError: Reference genome 'GRCh38' does not have a sequence loaded. Use 'add_sequence' to load the sequence from a FASTA file.

If I change it a little to this I just get a number (60337758) and no error:

from gnomad.utils.reference_genome import get_reference_genome

chr20 = mt.filter_rows(mt.locus.contig == 'chr20').rows().select()
ref = get_reference_genome(chr20.locus, add_sequence=True)

chr20 = chr20.key_by(
            locus=hl.locus(contig=chr20.locus.contig, pos=chr20.locus.position, reference_genome=ref)
        )
chr20.annotate(context=chr20.locus.sequence_context())._force_count()

But maybe there is a different(and likely better) way to add the reference genome sequence that I should try.

above your chr20 = ... line add:

hl.get_reference('GRCh38')\
  .add_sequence('gs://hail-common/references/Homo_sapiens_assembly38.fasta.gz')

Thank you Tim! That is much cleaner. Gives no error still, just the number 60337758

hmmm…weird. maybe try without the filter?

No error, count is 2861561184

I now see what’s going on in impute_sex_ploidy, I’ll see if I can replicate.

Thank you so much Tim!

We’ve opened https://github.com/hail-is/hail/pull/9427 to fix another issue we found when working on this.

Unfortunately we haven’t had luck in replicating your exact issue, and I’m not sure my PR will fix it.