Hi, hail community.
What I am trying to do it to reduce the size of a GVCF by applying:
- Creating a sparse matrix with all my samples using
combine_gvcfs
. - Get a separated matrix, from that sparse matrix, for each family (a family can be of one to three samples)
a. Densify that family sparse matrix
b. Remove all regions that are homozygous referent for all the members - Recombined all the densified family matrices into a single matrix
While I can do it using some toy examples, this process fails in production because it cannot save the densified matrix, but the sparse matrix is properly written to disk.
My code follows:
# Part 2
denseByFamily = []
for idx, fam in enumerate( experiments_by_family.keys() ):
sam = hl.literal( experiments_by_family[ fam ], 'array<str>' )
familyMatrix = sparseMatrix.filter_cols( sam.contains( sparseMatrix['s'] ) )
familyMatrix = hl.experimental.densify( familyMatrix )
familyMatrix = familyMatrix.annotate_rows( nH = hl.agg.count_where( familyMatrix.LGT.is_hom_ref() ) )
familyMatrix = familyMatrix.filter_rows( familyMatrix.nH < familyMatrix.count_cols() )
denseByFamily.append( familyMatrix )
# Part 3
denseMatrix = denseByFamily[ 0 ]
for ii in range(1 , len( denseByFamily ) ):
denseMatrix = full_outer_join_mt( denseMatrix, denseByFamily[ ii ] )
denseMatrix.write( outputDenseMatrix, overwrite = True )
Where `experiments_by_family is a dictionary having as key the family ids and as value a list of samples.
But this raises the following exceptions:
Traceback (most recent call last):
File "/apps/pipeline/main.py", line 342, in <module>
main(sqlContext, main_conf, chrom, nchroms, step, somaticFlag)
File "/apps/pipeline/main.py", line 136, in main
combine.createDenseMatrix( url_project, prefix_hdfs, max_items_batch, denseMatrix_path, gvcf_store_path, chrom, group, token, gpap_id, gpap_token, save_family_dense = False )
File "/apps/pipeline/vcfLoader-0.1-py3.6.egg/rdconnect/combine.py", line 119, in createDenseMatrix
File "/usr/local/lib/python3.6/site-packages/hail/experimental/full_outer_join_mt.py", line 33, in full_outer_join_mt
left = left.select_rows(left_row=left.row)
File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 1155, in select_rows
return self._select_rows(caller, new_row)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-1062>", line 2, in _select_rows
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 3465, in _select_rows
return cleanup(MatrixTable(MatrixMapRows(base._mir, row._ir)))
File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 568, in __init__
indices=self._row_indices)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-530>", line 2, in construct_reference
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3916, in construct_reference
return construct_expr(ir, type, indices)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3907, in construct_expr
return typ_to_expr[type.__class__](ir, type, indices, aggregations)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-462>", line 2, in __init__
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 1467, in __init__
self._aggregations)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3907, in construct_expr
return typ_to_expr[type.__class__](ir, type, indices, aggregations)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-462>", line 2, in __init__
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 1470, in __init__
self._aggregations)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
[...]
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 1470, in __init__
self._aggregations)
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-528>", line 2, in construct_expr
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/expr/expressions/typed_expressions.py", line 3904, in construct_expr
elif type in scalars:
File "/usr/local/lib/python3.6/site-packages/hail/expr/types.py", line 186, in __hash__
return 43 + hash(str(self))
File "/usr/local/lib/python3.6/site-packages/hail/expr/types.py", line 1410, in __str__
return "locus<{}>".format(escape_parsable(str(self.reference_genome)))
RecursionError: maximum recursion depth exceeded while calling a Python object
I0318 16:33:33.804427 359 sched.cpp:2166] Asked to stop the driver
I0318 16:33:33.804949 100 sched.cpp:1204] Stopping framework c2f58ccb-52ae-4a55-9c46-f870f92bb209-1307
Build step 'Execute shell' marked build as failure
Finished: FAILURE
So, since the RecursionError: maximum recursion depth exceeded
I tried to make some intermediate steps, changing the Part 3 to:
# Part 3 - updated
denseMatrix = denseByFamily[ 0 ]
for ii in range(1 , len( denseByFamily ) ):
denseMatrix = full_outer_join_mt( denseMatrix, denseByFamily[ ii ] )
if ii % 5 == 0 and saveMidStep:
denseMatrix.write( outputDenseMatrix, overwrite = True )
denseMatrix.write( outputDenseMatrix, overwrite = True )
And it worked for the first 30 out of 97 families, with 1 to 3 members, of my current “test set”. But then raised:
Traceback (most recent call last):
File "/apps/pipeline/main.py", line 342, in <module>
main(sqlContext, main_conf, chrom, nchroms, step, somaticFlag)
File "/apps/pipeline/main.py", line 136, in main
combine.createDenseMatrix( url_project, prefix_hdfs, max_items_batch, denseMatrix_path, gvcf_store_path, chrom, group, token, gpap_id, gpap_token, save_family_dense = False )
File "/apps/pipeline/vcfLoader-0.1-py3.6.egg/rdconnect/combine.py", line 126, in createDenseMatrix
File "</usr/local/lib/python3.6/site-packages/decorator.py:decorator-gen-1034>", line 2, in write
File "/usr/local/lib/python3.6/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.6/site-packages/hail/matrixtable.py", line 2508, in write
Env.backend().execute(MatrixWrite(self._mir, writer))
File "/usr/local/lib/python3.6/site-packages/hail/backend/backend.py", line 109, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File "/apps/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/apps/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/apps/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o563.executeJSON.
: java.lang.StackOverflowError
at is.hail.expr.ir.Pretty$.is$hail$expr$ir$Pretty$$pretty$1(Pretty.scala:98)
at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
at is.hail.utils.richUtils.RichIterator$.foreachBetween$extension(RichIterator.scala:32)
at is.hail.utils.richUtils.RichIterable.foreachBetween(RichIterable.scala:12)
at is.hail.expr.ir.Pretty$.is$hail$expr$ir$Pretty$$pretty$1(Pretty.scala:402)
[...]
at is.hail.utils.richUtils.RichIterable.foreachBetween(RichIterable.scala:12)
at is.hail.expr.ir.Pretty$.is$hail$expr$ir$Pretty$$pretty$1(Pretty.scala:402)
at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
at is.hail.expr.ir.Pretty$$anonfun$is$hail$expr$ir$Pretty$$pretty$1$6.apply(Pretty.scala:402)
at is.hail.utils.richUtils.RichIterator$.foreachBetween$extension(RichIterator.scala:32)
I0319 08:57:46.325139 4185 sched.cpp:2166] Asked to stop the driver
I0319 08:57:46.325407 105 sched.cpp:1204] Stopping framework c2f58ccb-52ae-4a55-9c46-f870f92bb209-1313
Build step 'Execute shell' marked build as failure
Finished: FAILURE
What I am doing wrong?
Thanks in advance,
~Carles