Annotate with bed files in hail 0.2beta

Hi Hail team,

we try to migrate to hail 0.2beta.

In Hail 0.1 one could use bed files to annotate variants:
bed = KeyTable.import_bed(‘genomicSuperDups.bed’)
vds = vds.annotate_variants_table(bed, root=‘va.genomicSuperDups’)

How can this be done in Hail 0.2beta
Import works.
bed = hc.import_bed(‘genomicSuperDups.bed’)
We don’t find a way to annotate the row.
Please, can you help.

Regards,
Tim

Hi Tim,

The equivalent in 0.2 is:

bed = hl.import_bed('genomicSuperDups.bed')
ds = ds.annotate_rows(genomicSuperDups = bed[ds.locus])

This will add a new row field to the matrix table which is genomicSuperDups with type Struct{} or the empty struct. If the locus is not contained in an interval, the struct will be missing (not empty). To filter to variants in specified intervals from the BED file, you can write ds.filter_rows(hl.is_defined(ds.genomicSuperDups)).

For BED files with an extra column that’s the annotation for the interval, you can do

bed = hc.import_bed('genomicSuperDups_w_annotation.bed')
ds = ds.annotate_rows(target = bed[ds.locus].target)

This will generate a new row annotation target of type tstr or string. Any variants not contained in an interval will have a missing value for target.

I have a PR in progress to add these examples to the documentation.

Best,
Jackie

Hi Jackie,

Thank you. It works.
But:
hl.import_bed possibly contains an unwanted feature.
bed-files a 0-based. import_bed adds 1 to the start and end position.
That can cause the following error at the end of chromosomes:

Invalid locus 2:243199374' found. Position243199374’ is not within the range [1-243199373] for reference genome `GRCh37’

Would it be better to add 1 only to the start position?
Regards,
Tim

Workaround:
awk -F"\t" ‘{print $1"\t"$2"\t"($3-1)}’ lowcomplexityregions.bed > lowcomplexityregions2.bed

bed-file
1 0 10000
1 10015 10464
1 10655 10784

bed = hc.import_bed(‘lowcomplexityregions.bed’)
vds = vds.annotate_rows(lowcomplexityregions = bed[vds.locus])

In [7]: bed.show(3)
±------------------------+
| interval |
±------------------------+
| interval<locus> |
±------------------------+
| [1:1-1:10001) |
| [1:10016-1:10465) |
| [1:10656-1:10785) |
±------------------------+

Hi Tim,

Thank you for reporting this issue. It is definitely not the intended behavior. I’ll fix it later today.

Best,
Jackie

for tracking purposes: https://github.com/hail-is/hail/pull/3250

Hi Tim,

The fix should now be in the latest version of Hail. Please let me know if you are still running into problems.

Best,
Jackie