Importing variant call data from tsv

smcnulty · November 12, 2020, 5:25pm

I have a tsv file of variants (GRCh37) and related data. I need to add gnomAD frequency data to my table. I have the gnomAD hail table, so intersecting the data should be straightforward.

I see tutorials on importing variant data from a VCF file, but I’m struggling to get my data into the correct format to intersect with gnomAD.

Here’s an example of what my table looks like (just the first few columns and rows):

source	type	chromosome	position	reference	mutation	quality	GT	DP
HS	snp	chr1	36933434	G	A	69.5549	.	2276
HS	snp	chr1	43814978	A	T	0	.	2367
HS	snp	chr1	43814979	G	A	68	.	2377
HS	mnp	chr1	43815007	GTG	AGC	0	.	1742
HS	snp	chr1	43815008	T	A	33.5549	.	1748
HS	snp	chr1	43815008	T	C	0	.	1748

I’ve tried reformatting it like this before import in attempt to mimic the format I see when I view the gnomAD data:

source	type	locus	alleles	quality	GT	DP
HS	snp	1:36933434	[“G”,“A”]	185	.	6612
HS	snp	1:43814978	[“A”,“T”]	37	.	6822
HS	snp	1:43814979	[“G”,“A”]	148	.	6826
HS	mnp	1:43815007	[“GTG”,“AGC”]	0	.	5402
HS	snp	1:43815008	[“T”,“A”]	41	.	5464
HS	snp	1:43815008	[“T”,“C”]	512.745	.	5464
HS	mnp	1:43815008	[“TG”,“AA”]	0	.	5407
HS	del	1:43815008	[“TGGCAGTTTC”,“AAAA”]	0	.	5135

Somehow, I can’t quite figure out how to get this into the right format to intersect with the frequency data from gnomAD. I’m sorry for the super basic question. Advice is very much appreciated.

tpoterba · November 12, 2020, 5:29pm

In order to join, you’ll need a common key – in this a field of type locus<GRCh37> and the alleles of type array<str>.

Something like:

ht = ht.key_by(
    locus=hl.locus(ht.chromosome, ht.position, reference_genome='GRCh37'), 
    alleles=[ht.reference, ht.mutation])

However, it looks ilke you have chr prefixes on chromosomes. So something like this should fix:

ht = ht.key_by(
    locus=hl.locus(ht.chromosome.replace('chr', ''), ht.position, reference_genome='GRCh37'), 
    alleles=[ht.reference, ht.mutation])

smcnulty · November 12, 2020, 5:31pm

Thank you so much!

When I import my first table, is it ok to be in table format or does it need to be in matrix table format?

tpoterba · November 12, 2020, 5:31pm

table is fine!

Topic		Replies	Views
Help for annotating a matrixtable variant data in DNAnexus with gnomAD database Hail Query & hailctl	11	488	February 9, 2023
Gnomad allele frequency query Hail Query & hailctl	11	2778	March 31, 2021
Creating gnomadFreq.tsv file Hail Query & hailctl	7	778	February 3, 2020
Does Hail support gvcf from GATK Hail Query & hailctl	6	976	January 7, 2019
Annotating variants in a matrix table with 1000genomes database Hail Query & hailctl	0	343	April 20, 2023

Importing variant call data from tsv

Related topics