Invalid '==' comparison, cannot compare expressions of type 'call' and 'str'

Hello,

I am attempting to run the following block of code as part of a WES variant QC script, but I continue to generate the following error. What is the correct way to probe whether the genotypes listed under ds.GT are equal to a particular string if not with the β€˜==’ operator?

Thank you in advance.

'GRNC_cond2': lambda ds: ((ds.GT == './0') | (ds.GT == './1') | (ds.GT == '0/.') | (ds.GT == '1/.'))

  File "/gpfs/fs1/home/mg1646/.conda/envs/variantqc_hailenv/lib/python3.8/site-packages/hail/expr/expressions/base_expression.py", line 781, in __eq__
    return self._compare_op("==", other)
  File "/gpfs/fs1/home/mg1646/.conda/envs/variantqc_hailenv/lib/python3.8/site-packages/hail/expr/expressions/base_expression.py", line 587, in _compare_op
    raise TypeError(f"Invalid '{op}' comparison, cannot compare expressions "
TypeError: Invalid '==' comparison, cannot compare expressions of type 'call' and 'str'

Code here:

# dict of lambdas for filter expressions
    genotype_exprs = {
        # DP: Genotypes below reading depth
        'GDP' : lambda ds: ds.DP < args.depth_threshold,
        # GDP: heterozygous genotypes outside of allelic balance
        'GAB' : lambda ds: (
            ds.GT.is_het() & (
                (ds.AD[0]/ds.DP > max(1 - args.allelic_balance, args.allelic_balance))
                | (ds.AD[0]/ds.DP < min(1 - args.allelic_balance, args.allelic_balance))
            )
        ),
        # GGQ: Genotypes below quality threshold
        'GGQ': lambda ds: ds.GQ < args.genotype_quality,
        # GNP: Male het chrX nonpar variants
        'GNP': lambda ds: ds.GT.is_het() & ds.locus.in_x_nonpar() & (ds.sa.GENDER == "Male"),
        # GRNC: Genotypes with RNC == 1
        'GRNC_cond1': lambda ds: ds.RNC[0] == "11",
        'GRNC_cond2': lambda ds: ((ds.GT == './0') | (ds.GT == './1') | (ds.GT == '0/.') | (ds.GT == '1/.'))
    }

Try using hl.Call to compare to the ds.GT field.
I.e
ds.GT == hl.Call([0, 1])

https://hail.is/docs/0.2/genetics/hail.genetics.Call.html#call

1 Like

@anh151 is totally correct.

I’ll note that Hail cannot parse ./1, 1/. , or anything else that’s β€œhalf-missing”. VCF 4.3 is not clear on whether these are permissible:

  • GT (String): Genotype, encoded as allele values separated by either of / or |. The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1, 1 | 0, or 1/2, etc. Haploid calls, e.g. on Y, male non-pseudoautosomal X, or mitochondrion, are indicated by having only one allele value. A triploid call might look like 0/0/1. If a call cannot be made for a sample at a given locus, β€˜.’ must be specified for each missing allele in the GT field (for example β€˜./.’ for a diploid genotype and β€˜.’ for haploid genotype). The meanings of the separators are as follows (see the PS field below for more details on incorporating phasing information into the genotypes):

β—¦ / : genotype unphased
β—¦ | : genotype phased

Hail will error when it sees ./1 in a VCF file. You can also use call_field[0] and call_field[1] to access the first and second allele, respectively. A triploid call supports 0, 1, and 2, etc.

The allele order in an unphased call is always smaller-index first, even if you provide a flipped textual representation.

In [5]: import hail as hl
   ...: ht = hl.Table.parallelize([hl.Struct(x='0/1'), hl.Struct(x='1/0')])
   ...: ht = ht.annotate(x = hl.parse_call(ht.x))
   ...: ht = ht.annotate(left = ht.x[0], right = ht.x[1])
   ...: ht.collect()
Out[5]: 
[Struct(x=Call(alleles=[0, 1], phased=False), left=0, right=1),
 Struct(x=Call(alleles=[0, 1], phased=False), left=0, right=1)]