Exception(s) when writing dense matrix after multiple merges of densify matrices

Hi, hail community.

What I am trying to do it to reduce the size of a GVCF by applying:

  1. Creating a sparse matrix with all my samples using combine_gvcfs.
  2. Get a separated matrix, from that sparse matrix, for each family (a family can be of one to three samples)
    a. Densify that family sparse matrix
    b. Remove all regions that are homozygous referent for all the members
  3. Recombined all the densified family matrices into a single matrix

While I can do it using some toy examples, this process fails in production because it cannot save the densified matrix, but the sparse matrix is properly written to disk.

My code follows:

# Part 2
denseByFamily = []
for idx, fam in enumerate( experiments_by_family.keys() ):
    sam = hl.literal( experiments_by_family[ fam ], 'array<str>' )
    familyMatrix = sparseMatrix.filter_cols( sam.contains( sparseMatrix['s'] ) )
    familyMatrix = hl.experimental.densify( familyMatrix )
    familyMatrix = familyMatrix.annotate_rows( nH = hl.agg.count_where( familyMatrix.LGT.is_hom_ref() ) )
    familyMatrix = familyMatrix.filter_rows( familyMatrix.nH < familyMatrix.count_cols() )
    denseByFamily.append( familyMatrix )

# Part 3
denseMatrix = denseByFamily[ 0 ] 
for ii in range(1 , len( denseByFamily ) ):
    denseMatrix = full_outer_join_mt( denseMatrix, denseByFamily[ ii ] )
denseMatrix.write( outputDenseMatrix, overwrite = True )

Where `experiments_by_family is a dictionary having as key the family ids and as value a list of samples.

But this raises the following exceptions:

Traceback (most recent call last):
  File "/apps/pipeline/main.py", line 342, in <module>
    main(sqlContext, main_conf, chrom, nchroms, step, somaticFlag)
  File "/apps/pipeline/main.py", line 136, in main
    combine.createDenseMatrix( url_project, prefix_hdfs, max_items_batch, denseMatrix_path, gvcf_store_path, chrom, group, token, gpap_id, gpap_token, save_family_dense = False )
  File "/apps/pipeline/vcfLoader-0.1-py3.6.egg/rdconnect/combine.py", line 119, in createDenseMatrix
  File "/usr/local/lib/python3.6/site-packages/hail/experimental/full_outer_join_mt.py", line 33, in full_outer_join_mt
    left = left.select_rows(left_row=left.row)
  File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 1155, in select_rows
    return self._select_rows(caller, new_row)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-1062>", line 2, in _select_rows
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 3465, in _select_rows
    return cleanup(MatrixTable(MatrixMapRows(base._mir, row._ir)))
  File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 568, in __init__
    indices=self._row_indices)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-530>", line 2, in construct_reference
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3916, in construct_reference
    return construct_expr(ir, type, indices)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3907, in construct_expr
    return typ_to_expr[type.__class__](ir, type, indices, aggregations)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-462>", line 2, in __init__
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 1467, in __init__
    self._aggregations)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3907, in construct_expr
    return typ_to_expr[type.__class__](ir, type, indices, aggregations)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-462>", line 2, in __init__
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 1470, in __init__
    self._aggregations)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  
  [...]
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 1470, in __init__
    self._aggregations)
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3904, in construct_expr
    elif type in scalars:
  File "/usr/local/lib/python3.6/site-packages/hail/expr/types.py", line 186, in __hash__
    return 43 + hash(str(self))
  File "/usr/local/lib/python3.6/site-packages/hail/expr/types.py", line 1410, in __str__
    return "locus<{}>".format(escape_parsable(str(self.reference_genome)))
RecursionError: maximum recursion depth exceeded while calling a Python object
I0318 16:33:33.804427   359 sched.cpp:2166] Asked to stop the driver
I0318 16:33:33.804949   100 sched.cpp:1204] Stopping framework c2f58ccb-52ae-4a55-9c46-f870f92bb209-1307
Build step 'Execute shell' marked build as failure
Finished: FAILURE

So, since the RecursionError: maximum recursion depth exceeded I tried to make some intermediate steps, changing the Part 3 to:

# Part 3 - updated
denseMatrix = denseByFamily[ 0 ] 
for ii in range(1 , len( denseByFamily ) ):
    denseMatrix = full_outer_join_mt( denseMatrix, denseByFamily[ ii ] )
    if ii % 5 == 0 and saveMidStep:
        denseMatrix.write( outputDenseMatrix, overwrite = True )
denseMatrix.write( outputDenseMatrix, overwrite = True )

And it worked for the first 30 out of 97 families, with 1 to 3 members, of my current “test set”. But then raised:

Traceback (most recent call last):
  File "/apps/pipeline/main.py", line 342, in <module>
    main(sqlContext, main_conf, chrom, nchroms, step, somaticFlag)
  File "/apps/pipeline/main.py", line 136, in main
    combine.createDenseMatrix( url_project, prefix_hdfs, max_items_batch, denseMatrix_path, gvcf_store_path, chrom, group, token, gpap_id, gpap_token, save_family_dense = False )
  File "/apps/pipeline/vcfLoader-0.1-py3.6.egg/rdconnect/combine.py", line 126, in createDenseMatrix
  File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-1034>", line 2, in write
  File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 2508, in write
    Env.backend().execute(MatrixWrite(self._mir, writer))
  File "/usr/local/lib/python3.6/site-packages/hail/backend/backend.py", line 109, in execute
    result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
  File "/apps/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/apps/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/apps/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o563.executeJSON.
: java.lang.StackOverflowError
    at is.hail.expr.ir.Pretty$.is$hail$expr$ir$Pretty$$pretty$1(Pretty.scala:98)
    at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
    at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
    at is.hail.utils.richUtils.RichIterator$.foreachBetween$extension(RichIterator.scala:32)
    at is.hail.utils.richUtils.RichIterable.foreachBetween(RichIterable.scala:12)
    at is.hail.expr.ir.Pretty$.is$hail$expr$ir$Pretty$$pretty$1(Pretty.scala:402)
    [...]
    at is.hail.utils.richUtils.RichIterable.foreachBetween(RichIterable.scala:12)
    at is.hail.expr.ir.Pretty$.is$hail$expr$ir$Pretty$$pretty$1(Pretty.scala:402)
    at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
    at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
    at is.hail.utils.richUtils.RichIterator$.foreachBetween$extension(RichIterator.scala:32)

I0319 08:57:46.325139  4185 sched.cpp:2166] Asked to stop the driver
I0319 08:57:46.325407   105 sched.cpp:1204] Stopping framework c2f58ccb-52ae-4a55-9c46-f870f92bb209-1313

Build step 'Execute shell' marked build as failure
Finished: FAILURE

What I am doing wrong?

Thanks in advance,
~Carles

These two errors are symptoms of the same problem – a computational graph that is too large/deep. This isn’t your fault, and we should handle these kind of massive queries gracefully by using data structures on the heap instead of using the function stack in Python and the scala backend.

How many families do you have? This part is likely triggering the depth error:

denseMatrix = denseByFamily[ 0 ] 
for ii in range(1 , len( denseByFamily ) ):
    denseMatrix = full_outer_join_mt( denseMatrix, denseByFamily[ ii ] )
denseMatrix.write( outputDenseMatrix, overwrite = True )

It’s possible that redesigning this to do a tree merge could help, something like this:

but with full_outer_join_mt instead of union_cols.

So, you suggest applying the join per pairs in a tree-fashion, right?

Reusing the code from the linked questions, I’ll do like this:

	mts_ = dense_by_family[:]
	ii = 0
	while len( mts_ ) > 1:
		ii += 1
		print( 'Compression {0}/{1}'.format( ii, len( mts_ ) ) )
		tmp = []
		for jj in range( 0, len(mts_), 2 ):
			tmp.append( full_outer_join_mt( mts_[ jj ], mts_[ jj+2 ] ) )
		mts_ = tmp[:]
	[dense_matrix] = mts_
	del mts_

Am I right?

Thanks,
~Carles