Changes to filter_variants_intervals; filter intervals programmatically


#1

Breaking change to filter_variants_intervals

We’ve pushed some updates to the filter_variants_intervals method. This involves building out functionality on the Interval, and adding an IntervalTree class. The filter_variants_intervals now takes one of these objects (Interval or IntervalTree) instead of a file path. This is a breaking change, but the old functionality can still be accessed.

Filtering from interval file, old style:

>>> vds = vds.filter_variants_intervals('my_interval_list.txt')

Filtering from interval file: as of today:

>>> vds = vds.filter_variants_intervals(IntervalTree.read('my_interval_list.txt'))

#New functionality
The new filter_variants_intervals method takes either an Interval or IntervalTree object now, instead of a file path. We’ve also added parse functions on Interval (parse) and IntervalTree (parse_all), which let you easily construct interval objects from strings like:

>>> from hail.representation import Interval

# self-explanatory
>>> interval = Interval.parse('1:150-250')

# Chromosome-spanning interval: 1:150 to 2:350 
>>> interval = Interval.parse('1:150-2:350')

# All of chromosome X
>>> interval = Interval.parse('X')

# All of the autosomes (beginning of chr 1, to end chr 22)
>>> interval = Interval.parse('1-22')

# Special keywords: M = 10^6, K = 10^3
>>> interval = Interval.parse('16:29.5M-30.2M')
>>> interval = Interval.parse('15:200k-300k')

# Special keywords: start and end
>>> interval = Interval.parse('1:150M-END')
>>> interval = Interval.parse('19:start-30000000')

# Use an interval to filter a VDS
>>> vds = vds.filter_variants_intervals(interval)

# subset to autosomes
>>> vds = vds.filter_variants_intervals(Interval.parse('1-22'))

# subset to just chromosome 19
>>> vds = vds.filter_variants_intervals(Interval.parse('19'))

In case you’re still not sold on this awesome interface, take a look at this blog post to see how interval filtering can make your analysis much, much faster.


#2

Just a note that the function currently is filter_intervals.