In Hail 0.1 one could use bed files to annotate variants:
bed = KeyTable.import_bed(‘genomicSuperDups.bed’)
vds = vds.annotate_variants_table(bed, root=‘va.genomicSuperDups’)
How can this be done in Hail 0.2beta
Import works.
bed = hc.import_bed(‘genomicSuperDups.bed’)
We don’t find a way to annotate the row.
Please, can you help.
bed = hl.import_bed('genomicSuperDups.bed')
ds = ds.annotate_rows(genomicSuperDups = bed[ds.locus])
This will add a new row field to the matrix table which is genomicSuperDups with type Struct{} or the empty struct. If the locus is not contained in an interval, the struct will be missing (not empty). To filter to variants in specified intervals from the BED file, you can write ds.filter_rows(hl.is_defined(ds.genomicSuperDups)).
For BED files with an extra column that’s the annotation for the interval, you can do
bed = hc.import_bed('genomicSuperDups_w_annotation.bed')
ds = ds.annotate_rows(target = bed[ds.locus].target)
This will generate a new row annotation target of type tstr or string. Any variants not contained in an interval will have a missing value for target.
I have a PR in progress to add these examples to the documentation.
Thank you. It works.
But:
hl.import_bed possibly contains an unwanted feature.
bed-files a 0-based. import_bed adds 1 to the start and end position.
That can cause the following error at the end of chromosomes:
Invalid locus 2:243199374' found. Position243199374’ is not within the range [1-243199373] for reference genome `GRCh37’
Would it be better to add 1 only to the start position?
Regards,
Tim