I have a tsv file of variants (GRCh37) and related data. I need to add gnomAD frequency data to my table. I have the gnomAD hail table, so intersecting the data should be straightforward.
I see tutorials on importing variant data from a VCF file, but I’m struggling to get my data into the correct format to intersect with gnomAD.
Here’s an example of what my table looks like (just the first few columns and rows):
source | type | chromosome | position | reference | mutation | quality | GT | DP |
---|---|---|---|---|---|---|---|---|
HS | snp | chr1 | 36933434 | G | A | 69.5549 | . | 2276 |
HS | snp | chr1 | 43814978 | A | T | 0 | . | 2367 |
HS | snp | chr1 | 43814979 | G | A | 68 | . | 2377 |
HS | mnp | chr1 | 43815007 | GTG | AGC | 0 | . | 1742 |
HS | snp | chr1 | 43815008 | T | A | 33.5549 | . | 1748 |
HS | snp | chr1 | 43815008 | T | C | 0 | . | 1748 |
I’ve tried reformatting it like this before import in attempt to mimic the format I see when I view the gnomAD data:
source | type | locus | alleles | quality | GT | DP |
---|---|---|---|---|---|---|
HS | snp | 1:36933434 | [“G”,“A”] | 185 | . | 6612 |
HS | snp | 1:43814978 | [“A”,“T”] | 37 | . | 6822 |
HS | snp | 1:43814979 | [“G”,“A”] | 148 | . | 6826 |
HS | mnp | 1:43815007 | [“GTG”,“AGC”] | 0 | . | 5402 |
HS | snp | 1:43815008 | [“T”,“A”] | 41 | . | 5464 |
HS | snp | 1:43815008 | [“T”,“C”] | 512.745 | . | 5464 |
HS | mnp | 1:43815008 | [“TG”,“AA”] | 0 | . | 5407 |
HS | del | 1:43815008 | [“TGGCAGTTTC”,“AAAA”] | 0 | . | 5135 |
Somehow, I can’t quite figure out how to get this into the right format to intersect with the frequency data from gnomAD. I’m sorry for the super basic question. Advice is very much appreciated.