"found allele outside of expected range" error

Hi there!

whenever I try to mt.write(), this error shows up:

HailException: found allele outside of expected range [0, 2]: 3

My matrix table has undergone: split_multi_hts(), vep(), and variant_qc().
I had no problem show()ing this matrixtable, but only when I try to write() onto my disk, this error appears.

I want to know:

  1. What does the error message mean? Is it saying that although the mt.alleles were composed of 0, 1, and 2 alleles --> the mt.GT is showing genotype that contains allele that is out of this range?(such as 3?)

  2. What should I do to fix this?

Lazy evaluation

One of the most challenging pieces of Hail’s learning curve is its lazy evaluation. This means that Hail doesn’t actually run any computation until it hits a certain kind of method – write, aggregate, export, show.

In the following pipeline:

mt = hl.import_vcf(path)
mt.write('...')

If the VCF at path is malformed outside the header, you’d see an error in the second line, not the first, because Hail hasn’t actually parsed the VCF until the second line. This is not great from a usability perspective, but it affords us two advantages:

  1. Scalability: Hail can process more data than can fit in memory
  2. Performance: Hail can employ a query optimizer that rewrites full queries to do minimal work.

This particular error

I think this error in particular indicates that there is an unexpected allele somewhere, probably in split_multi_hts. Despite the lazy evaluation, we try to capture the Python stack trace of the original call site that triggered the error – can you paste the full stack trace (especially the python bit if there is one)?

Hi Tim!
Thank you so much for the clear explanation! :slight_smile: I think I understand Hail better now.

I copy & pasted the stack trace below. Should I also put down the java stack trace?

---------------------------------------------------------------------------

   Traceback (most recent call last)
   /usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
       700                 type_pprinters=self.type_printers,
       701                 deferred_pprinters=self.deferred_printers)
   --> 702             printer.pretty(obj)
       703             printer.flush()
       704             return stream.getvalue()

   /usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
       392                         if cls is not object \
       393                                 and callable(cls.__dict__.get('__repr__')):
   --> 394                             return _repr_pprint(obj, self, cycle)
       395 
       396             return _default_pprint(obj, self, cycle)

   /usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
       698     """A pprint that just redirects to the normal repr function."""
       699     # Find newlines and replace them with p.break_()
   --> 700     output = repr(obj)
       701     lines = output.splitlines()
       702     with p.group():

   ~/.local/lib/python3.6/site-packages/hail/table.py in __repr__(self)
      1295 
      1296         def __repr__(self):
   -> 1297             return self.__str__()
      1298 
      1299         def data(self):

   ~/.local/lib/python3.6/site-packages/hail/table.py in __str__(self)
      1292 
      1293         def __str__(self):
   -> 1294             return self._ascii_str()
      1295 
      1296         def __repr__(self):

   ~/.local/lib/python3.6/site-packages/hail/table.py in _ascii_str(self)
      1318                 return s
      1319 
   -> 1320             rows, has_more, dtype = self.data()
      1321             fields = list(dtype)
      1322             trunc_fields = [trunc(f) for f in fields]

   ~/.local/lib/python3.6/site-packages/hail/table.py in data(self)
      1302                 row_dtype = t.row.dtype
      1303                 t = t.select(**{k: hl._showstr(v) for (k, v) in t.row.items()})
   -> 1304                 rows, has_more = t._take_n(self.n)
      1305                 self._data = (rows, has_more, row_dtype)
      1306             return self._data

   ~/.local/lib/python3.6/site-packages/hail/table.py in _take_n(self, n)
      1449             has_more = False
      1450         else:
   -> 1451             rows = self.take(n + 1)
      1452             has_more = len(rows) > n
      1453             rows = rows[:n]

   <decorator-gen-1119> in take(self, n, _localize)

   ~/.local/lib/python3.6/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
       612     def wrapper(__original_func, *args, **kwargs):
       613         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
   --> 614         return __original_func(*args_, **kwargs_)
       615 
       616     return wrapper

   ~/.local/lib/python3.6/site-packages/hail/table.py in take(self, n, _localize)
      2119         #""" 
      2120 
   -> 2121         return self.head(n).collect(_localize)
      2122 
      2123     @typecheck_method(n=int)

   <decorator-gen-1113> in collect(self, _localize)

   ~/.local/lib/python3.6/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
       612     def wrapper(__original_func, *args, **kwargs):
       613         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
   --> 614         return __original_func(*args_, **kwargs_)
       615 
       616     return wrapper

   ~/.local/lib/python3.6/site-packages/hail/table.py in collect(self, _localize)
      1918         e = construct_expr(rows_ir, hl.tarray(t.row.dtype))
      1919         if _localize:
   -> 1920             return Env.backend().execute(e._ir)
      1921         else:
      1922             return e

   ~/.local/lib/python3.6/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
        96                 raise HailUserError(message_and_trace) from None
        97 
   ---> 98             raise e

   ~/.local/lib/python3.6/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
        72         # print(self._hail_package.expr.ir.Pretty.apply(jir, True, -1))
        73         try:
   ---> 74             result = json.loads(self._jhc.backend().executeJSON(jir))
        75             value = ir.typ._from_json(result['value'])
        76             timings = result['timings']

   ~/.local/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, *args)
      1255         answer = self.gateway_client.send_command(command)
      1256         return_value = get_return_value(
   -> 1257             answer, self.gateway_client, self.target_id, self.name)
      1258 
      1259         for temp_arg in temp_args:

   ~/.local/lib/python3.6/site-packages/hail/backend/py4j_backend.py in deco(*args, **kwargs)
        30                 raise FatalError('%s\n\nJava stack trace:\n%s\n'
        31                                  'Hail version: %s\n'
   ---> 32                                  'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None
        33         except pyspark.sql.utils.CapturedException as e:
        34             raise FatalError('%s\n\nJava stack trace:\n%s\n'

   FatalError: HailException: found allele outside of expected range [0, 2]: 3