How do I create a locus and allele keyed table from chromosome, start position, end position, reference allele and alt allele?

my ht has the following structure:

----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'Chr': str 
    'Start': int32 
    'End': int32 
    'Ref': str 
    'Alt': str 
    'my_annotation': str
----------------------------------------
Key: []
----------------------------------------

how can I generate a “locus” and “alleles” equivalent to a mt, that I can then use with key_by and further annotation?

Hi @johnnyr ,

That’s a great question! I believe you want to do this:

ht = ht.annotate(locus = hl.locus(ht.Chr, ht.Start, reference_genome='GRCh37'),
                 alleles = hl.array([ht.Ref, ht.Alt]))
ht = ht.key_by(ht.locus, ht.alleles)

I’m not sure what the End field represents. Is End always Start + length(Ref)? Hail shouldn’t need that information, as far as I know.

1 Like

Hi @danking,

thanks a lot for your instant response! Works fine for me!

Is End always Start + length(Ref) ? Hail shouldn’t need that information, as far as I know.

Yes, that’s the idea - good to know it is not needed.