ExpressionException: Cannot combine expressions from different source objects

Hi,

I am doing identical operations with two datasets - nisc.ht and cidr.ht:

nisc = hl.read_table('file:///NISC.ht') 
cidr = hl.read_table('file:///cidr.ht')

mt = hl.import_vcf('file:///grch38_test.vcf', reference_genome='GRCh38', force_bgz=True)
mt = hl.split_multi_hts(mt.annotate_rows(locus_old=mt.locus, alleles_old=mt.alleles), permit_shuffle=True)
mt_aIndex = mt.a_index 

nisc_ac = nisc[mt.row_key].info.AC 
cidr_ac = cidr[mt.row_key].info.AC

Now, the following operation with nisc_ac works fine and I can output it using export, but with cidr_ac it fails:

nisc_ac_bool = hl.len(nisc_ac) < mt_aIndex
cidr_ac_bool = hl.len(cidr_ac) < mt_aIndex (fails!)

The error given is the following:

In [21]: cidr_ac_bool = hl.len(cidr_ac) < mt_aIndex                                                                                                                    
---------------------------------------------------------------------------
ExpressionException                       Traceback (most recent call last)
<ipython-input-21-7ce5ca00f04c> in <module>
----> 1 cidr_ac_bool = hl.len(cidr_ac) < mt_aIndex

</opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/decorator.py:decorator-gen-662> in __lt__(self, other)

~/.conda/envs/py37/lib/python3.7/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    575     def wrapper(__original_func, *args, **kwargs):
    576         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 577         return __original_func(*args_, **kwargs_)
    578 
    579     return wrapper

~/.conda/envs/py37/lib/python3.7/site-packages/hail/expr/expressions/typed_expressions.py in __lt__(self, other)
   1989             ``True`` if the left side is smaller than the right side.
   1990         """
-> 1991         return self._bin_op_numeric("<", other, lambda _: tbool)
   1992 
   1993     @typecheck_method(other=expr_numeric)

~/.conda/envs/py37/lib/python3.7/site-packages/hail/expr/expressions/base_expression.py in _bin_op_numeric(self, name, other, ret_type_f)
    565         else:
    566             ret_type = unified_type
--> 567         return me._bin_op(name, other, ret_type)
    568 
    569     def _bin_op_numeric_reverse(self, name, other, ret_type_f=None):

~/.conda/envs/py37/lib/python3.7/site-packages/hail/expr/expressions/base_expression.py in _bin_op(self, name, other, ret_type)
    575     def _bin_op(self, name, other, ret_type):
    576         other = to_expr(other)
--> 577         indices, aggregations = unify_all(self, other)
    578         if (name in {'+', '-', '*', '/', '//'}) and (ret_type in {tint32, tint64, tfloat32, tfloat64}):
    579             op = ir.ApplyBinaryPrimOp(name, self._ir, other._ir)

~/.conda/envs/py37/lib/python3.7/site-packages/hail/expr/expressions/base_expression.py in unify_all(*exprs)
    351                 n=len(sources),
    352                 fields=''.join("\n        {}: {}".format(src, fds) for src, fds in sources.items())
--> 353             )) from None
    354     first, rest = exprs[0], exprs[1:]
    355     aggregations = first._aggregations

ExpressionException: Cannot combine expressions from different source objects.
    Found fields from 2 objects:
        <hail.matrixtable.MatrixTable object at 0x7f2371adce10>: ['locus', 'alleles']
        <hail.matrixtable.MatrixTable object at 0x7f23717ac990>: ['a_index']

Why is this happening?

Also, I tried to determine dimensions of nisc_ac and cidr_ac but nothing works to get the number of rows in them… (hl.len, count(), count_rows() - all fail). Of course, I could just use nisc.count() but its a bit awkward.

I think you’re getting tripped up by saving Hail expressions into variables. I think you’ll have a much easier time if you avoid assigning Hail expressions to variables. Do this instead:

import hail as hl

nisc = hl.read_table('file:///NISC.ht') 
cidr = hl.read_table('file:///cidr.ht')

mt = hl.import_vcf('file:///grch38_test.vcf', reference_genome='GRCh38', force_bgz=True)
mt = hl.split_multi_hts(mt.annotate_rows(locus_old=mt.locus, alleles_old=mt.alleles), permit_shuffle=True)
mt = mt.annotate_rows(
    nisc_ac = nisc[mt.row_key].info.AC,
    cidr_ac = cidr[mt.row_key].info.AC
)
mt = mt.annotate_rows(
    nisc_ac_bool = hl.len(nisc_ac) < mt.a_index,
    cidr_ac_bool = hl.len(cidr_ac) < mt.a_index
)
mt.rows().show()
print(mt.filter_rows(mt.nisc_ac_bool).count_rows())
print(mt.filter_rows(mt.cidr_ac_bool).count_rows())