Filter to contig

Hey hail team,

What is the fastest way to filter a large MatrixTable to a single contig? I have found using this to be faster than using just filter_intervals but wanted to see if there is a better way:

mt = hl.read_matrix_table(mt_path)
mt = hl.filter_intervals(mt, [hl.parse_locus_interval(contig)])
intervals = mt._calculate_new_partitions(n_partitions)
mt = hl.read_matrix_table(
    mt_path, _intervals=intervals
)

Thanks!

Sorry, missed this yesterday. The same amount of data is read in both cases, the difference is the partitioning. The calculate_new_partitions strategy might give you more parallelism downstream, which could certainly help. This looks fine.

1 Like

thanks!!