Breaking change to filter_variants_intervals
We’ve pushed some updates to the filter_variants_intervals method. This involves building out functionality on the Interval, and adding an IntervalTree class. The filter_variants_intervals now takes one of these objects (Interval or IntervalTree) instead of a file path. This is a breaking change, but the old functionality can still be accessed.
Filtering from interval file, old style:
>>> vds = vds.filter_variants_intervals('my_interval_list.txt')
Filtering from interval file: as of today:
>>> vds = vds.filter_variants_intervals(IntervalTree.read('my_interval_list.txt'))
#New functionality
The new filter_variants_intervals
method takes either an Interval or IntervalTree object now, instead of a file path. We’ve also added parse functions on Interval (parse) and IntervalTree (parse_all), which let you easily construct interval objects from strings like:
>>> from hail.representation import Interval
# self-explanatory
>>> interval = Interval.parse('1:150-250')
# Chromosome-spanning interval: 1:150 to 2:350
>>> interval = Interval.parse('1:150-2:350')
# All of chromosome X
>>> interval = Interval.parse('X')
# All of the autosomes (beginning of chr 1, to end chr 22)
>>> interval = Interval.parse('1-22')
# Special keywords: M = 10^6, K = 10^3
>>> interval = Interval.parse('16:29.5M-30.2M')
>>> interval = Interval.parse('15:200k-300k')
# Special keywords: start and end
>>> interval = Interval.parse('1:150M-END')
>>> interval = Interval.parse('19:start-30000000')
# Use an interval to filter a VDS
>>> vds = vds.filter_variants_intervals(interval)
# subset to autosomes
>>> vds = vds.filter_variants_intervals(Interval.parse('1-22'))
# subset to just chromosome 19
>>> vds = vds.filter_variants_intervals(Interval.parse('19'))
In case you’re still not sold on this awesome interface, take a look at this blog post to see how interval filtering can make your analysis much, much faster.