Filtering for a list of rsids


#1

Hello,

I am trying to filter a HAIL v0.1 VariantDataSet (UKBB data) on Databricks using a compound filter expression and the filter_variants_expr method but am receiving a fatal error IllegalStateException: Bytecode failed verification 2. I actually also tried using filter_variants_list building a representation.Variant object for each rsid’s chrm:pos:ref:alt but ended up with no variants in the filtered vds (no idea why)

print hail_filter_expr
(va.rsid == "rs459193") || (va.rsid == "rs6001930") || (va.rsid == "rs13078960")

print hail_var_list
[Variant(contig=5, start=55806751, ref=A, alts=[AltAllele(ref=A, alt=G)]), Variant(contig=22, start=40876234, ref=T, alts=[AltAllele(ref=T, alt=C)]), Variant(contig=3, start=85807590, ref=T, alts=[AltAllele(ref=T, alt=G)])]

bgen_vds = hc.import_bgen(row['bgen_path'].replace('s3://', 'dbfs:/mnt/'),
                          sample_file=row['bgen_sample_path'].replace('s3://', 'dbfs:/mnt/'))
#bgen_vds_filtered = bgen_vds.filter_variants_list(hail_var_list, keep=False).persist()
bgen_vds_filtered = bgen_vds.filter_variants_expr(hail_filter_expr, keep=False).persist()
FatalError                                Traceback (most recent call last)

in ()
12 sample_file=row[‘bgen_sample_path’].replace(‘s3://’, ‘dbfs:/mnt/’))
13 #bgen_vds_filtered = bgen_vds.filter_variants_list(hail_var_list, keep=False).persist()
—> 14 bgen_vds_filtered = bgen_vds.filter_variants_expr(hail_filter_expr, keep=False).persist()
15
16 # assert to ensure that at least one variant found with the wanted rsid

in persist(self, storage_level)

/local_disk0/spark-1322f4fc-61d3-4a12-99d4-3cd8a85d0761/userFiles-f34fca5d-2974-4972-bb62-28c8f0ac193e/addedFile4156138757619196325dbfs__FileStore_jars_f3963890_53e6_4f02_9df8_a3dd495e3ce3_hail_0_1_spark_2_2_0_44bec-6b92a.egg/hail/java.py in handle_py4j(func, *args, **kwargs)
119 raise FatalError(’%s\n\nJava stack trace:\n%s\n’
120 ‘Hail version: %s\n’
–> 121 ‘Error summary: %s’ % (deepest, full, Env.hc().version, deepest))
122 except py4j.protocol.Py4JError as e:
123 if e.args[0].startswith(‘An error occurred while calling’):

FatalError: IllegalStateException: Bytecode failed verification 2

Java stack trace:
java.lang.IllegalStateException: Bytecode failed verification 2
at is.hail.asm4s.FunctionBuilder.classAsBytes(FunctionBuilder.scala:196)
at is.hail.asm4s.FunctionBuilder.result(FunctionBuilder.scala:208)
at is.hail.expr.CM.runWithDelayedValues(CM.scala:78)
at is.hail.expr.Parser$.is$hail$expr$Parser$$evalNoTypeCheck(Parser.scala:49)
at is.hail.expr.Parser$.is$hail$expr$Parser$$eval(Parser.scala:62)
at is.hail.expr.Parser$.evalExpr(Parser.scala:74)
at is.hail.expr.Parser$.evalTypedExpr(Parser.scala:67)
at is.hail.expr.FilterVariants.execute(Relational.scala:348)
at is.hail.variant.VariantSampleMatrix.value$lzycompute(VariantSampleMatrix.scala:490)
at is.hail.variant.VariantSampleMatrix.value(VariantSampleMatrix.scala:488)
at is.hail.variant.VariantSampleMatrix.x$13$lzycompute(VariantSampleMatrix.scala:493)
at is.hail.variant.VariantSampleMatrix.x$13(VariantSampleMatrix.scala:493)
at is.hail.variant.VariantSampleMatrix.rdd$lzycompute(VariantSampleMatrix.scala:493)
at is.hail.variant.VariantSampleMatrix.rdd(VariantSampleMatrix.scala:493)
at is.hail.variant.VariantDatasetFunctions$.withGenotypeStream$extension(VariantDataset.scala:196)
at is.hail.variant.VariantDatasetFunctions$.persist$extension(VariantDataset.scala:190)
at is.hail.variant.VariantDatasetFunctions.persist(VariantDataset.scala:182)
at sun.reflect.GeneratedMethodAccessor427.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:293)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:226)
at java.lang.Thread.run(Thread.java:748)

Hail version: 0.1-1214727
Error summary: IllegalStateException: Bytecode failed verification 2

Ant insight would be greatly appreciated!

Any other suggestions on how to filter on a list of rsids is welcome as well. Thanks so much!


#2

Hi @palakpsheth!

I’m really sorry to hear you are encountering this issue. Do you have a particular reason for using Hail 0.1? Hail 0.2 has an easier to use interface, more functionality, and is receiving all the new features.

I have started trying to reproduce your issue in Hail 0.1. I will respond if I have more information.


#3

Ok, I can’t reproduce with hail version 0.1-74bf1ebb. The version of hail you are using is a little bit old; however, I do not think the changes from 1214727 to 74bf1ebb will fix this.

The following executes successfully for me:

from hail import *
hail_filter_expression = """(va.rsid == "rs459193") || (va.rsid == "rs6001930") || (va.rsid == "rs13078960")"""
hc = HailContext()
vds = hc.import_bgen('ukb_imp_chr22_v3.bgen')
vds.filter_variants_expr(hail_filter_expression, keep=False).count()

Can you share the hail.log file? It should reside in the working directory of the driver node.

Can you also share the output of:

print(vds.sample_schema)
print(vds.variant_schema)
print(vds.genotype_schema)