Ld_prune starts and stops error

Hi-
I’m writing a demo notebook that uses 1000 genomes data (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz) . I am running into an issue with ld_prune. After running the following code:

vcf_paths = [‘ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.bgz’]
geno = hl.import_vcf(vcf_paths,min_partitions = 100)
geno = hl.split_multi_hts(geno)
geno = hl.variant_qc(geno)
geno = geno.filter_rows(geno.variant_qc.AF[1] > 0.1)
pruned_variant_table = hl.ld_prune(geno.GT, r2 = 0.2, bp_window_size=500000, block_size = 1024)

I am getting this response:

2019-05-30 09:23:04 Hail: INFO: ld_prune: running local pruning stage with max queue size of 99274 variants
2019-05-30 09:23:10 Hail: INFO: wrote table with 9233 rows in 108 partitions to file:/tmp/hail.OfibaQ8KrJ2V/JSRMCTpZKa
2019-05-30 09:23:19 Hail: INFO: Wrote all 30 blocks of 9232 x 2504 matrix with block size 1024.

With this error message:


ValueError Traceback (most recent call last)
in
----> 1 pruned_variant_table = hl.ld_prune(geno.GT, r2 = 0.2, bp_window_size=500000, block_size = 1024)

</Users/tmajaria/projects/src/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1417> in ld_prune(call_expr, r2, bp_window_size, memory_per_core, keep_higher_maf, block_size)

~/projects/src/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
559 def wrapper(original_func, *args, **kwargs):
560 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
–> 561 return original_func(*args, **kwargs)
562
563 return wrapper

~/projects/src/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/statgen.py in ld_prune(call_expr, r2, bp_window_size, memory_per_core, keep_higher_maf, block_size)
3336 _, stops = hl.linalg.utils.locus_windows(locally_pruned_table.locus, bp_window_size)
3337
-> 3338 entries = r2_bm.sparsify_row_intervals(range(stops.size), stops, blocks_only=True).entries(keyed=False)
3339 entries = entries.filter((entries.entry >= r2) & (entries.i < entries.j))
3340 entries = entries.select(i = hl.int32(entries.i), j = hl.int32(entries.j))

</Users/tmajaria/projects/src/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1305> in sparsify_row_intervals(self, starts, stops, blocks_only)

~/projects/src/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
559 def wrapper(original_func, *args, **kwargs):
560 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
–> 561 return original_func(*args, **kwargs)
562
563 return wrapper

~/projects/src/miniconda3/envs/hail/lib/python3.7/site-packages/hail/linalg/blockmatrix.py in sparsify_row_intervals(self, starts, stops, blocks_only)
1063 raise ValueError(f’n_rows must be less than 2^31, found {n_rows}’)
1064 if len(starts) != n_rows or len(stops) != n_rows:
-> 1065 raise ValueError(f’starts and stops must both have length {n_rows} (the number of rows)’)
1066 if any([start < 0 for start in starts]):
1067 raise ValueError(‘all start values must be non-negative’)

ValueError: starts and stops must both have length 9232 (the number of rows)

I am not sure where to go from here. I’m using Hail version 0.2.14-8dcb6722c72a locally.

Any ideas would be great!
Thanks,
-Tim

tracking issue here: https://github.com/hail-is/hail/issues/6223