Importing variant call data from tsv

I have a tsv file of variants (GRCh37) and related data. I need to add gnomAD frequency data to my table. I have the gnomAD hail table, so intersecting the data should be straightforward.

I see tutorials on importing variant data from a VCF file, but I’m struggling to get my data into the correct format to intersect with gnomAD.

Here’s an example of what my table looks like (just the first few columns and rows):

source type chromosome position reference mutation quality GT DP
HS snp chr1 36933434 G A 69.5549 . 2276
HS snp chr1 43814978 A T 0 . 2367
HS snp chr1 43814979 G A 68 . 2377
HS mnp chr1 43815007 GTG AGC 0 . 1742
HS snp chr1 43815008 T A 33.5549 . 1748
HS snp chr1 43815008 T C 0 . 1748

I’ve tried reformatting it like this before import in attempt to mimic the format I see when I view the gnomAD data:

source type locus alleles quality GT DP
HS snp 1:36933434 [“G”,“A”] 185 . 6612
HS snp 1:43814978 [“A”,“T”] 37 . 6822
HS snp 1:43814979 [“G”,“A”] 148 . 6826
HS mnp 1:43815007 [“GTG”,“AGC”] 0 . 5402
HS snp 1:43815008 [“T”,“A”] 41 . 5464
HS snp 1:43815008 [“T”,“C”] 512.745 . 5464
HS mnp 1:43815008 [“TG”,“AA”] 0 . 5407
HS del 1:43815008 [“TGGCAGTTTC”,“AAAA”] 0 . 5135

Somehow, I can’t quite figure out how to get this into the right format to intersect with the frequency data from gnomAD. I’m sorry for the super basic question. Advice is very much appreciated.

In order to join, you’ll need a common key – in this a field of type locus<GRCh37> and the alleles of type array<str>.

Something like:

ht = ht.key_by(, ht.position, reference_genome='GRCh37'), 
    alleles=[ht.reference, ht.mutation])

However, it looks ilke you have chr prefixes on chromosomes. So something like this should fix:

ht = ht.key_by('chr', ''), ht.position, reference_genome='GRCh37'), 
    alleles=[ht.reference, ht.mutation])

Thank you so much!

When I import my first table, is it ok to be in table format or does it need to be in matrix table format?

table is fine!