Finding genotype for each (exome locus, sample ID) pair

LAN · October 26, 2018, 8:35pm

I cannot find the 2-dimensional table that gives the genotype for each (exome locus, sample ID) pair. Is it in gnomad.exomes.r2.1.sites.ht somewhere? I have downloaded that file and can read it into hail with hl.read_table, but I am not seeing what I want within there. Maybe I am failing to understand some of the field names, or maybe I am looking in the wrong place … help please!

Once I’ve found it, I’d like to export it (or parts of it) into a format that can be read into R, such as a TSV file. I am thinking that a write(’/mypath/myfile.tsv.bgz’) will do it, but if I am wrong, please let me know.

tpoterba · October 26, 2018, 9:02pm

gnomAD can’t release individual-level genotype data, unfortunately. Only summary statistics about sites.

LAN · October 28, 2018, 2:43pm

Thank you for the quick reply! Would these data be available from gnomAD via a dbGaP application or similar?

Regardless, to what extent are there statistics available from gnomAD that would quantify linkage disequilibrium between specified loci, in the total population of samples or within any of the subpopulations? My particular interest is in the exomic loci. Thanks!

tpoterba · October 28, 2018, 3:09pm

Would these data be available from gnomAD via a dbGaP application or similar

not that I know of.

I’m not sure about LD statistics either, sorry!

LAN · October 29, 2018, 7:06pm

Ah, I get it (I hope)! The web pages that describe manipulating SampleID-by-locus matrix data are not describing the gnomAD data set, they are describing Hail functionality. If my personal data set had that level of data then I could use Hail to manipulate it, but gnomAD doesn’t have that level of data!

No need to respond, unless it is apparent that I am still hopelessly confused, and you are kind enough to lead me!

tpoterba · October 30, 2018, 10:20am

I’m not sure exactly which pages you’re referring to, but most of the public gnomAD code was code used to generate the public summary statistics from the genotype-level (sample-by-variant matrix) data.

You are totally correct that you can’t go and reproduce their analysis because the genotype data is private, but you could apply those scripts (with some adjustments, probably) to your own dataset.

Topic		Replies	Views
Querying gnomad using hail table by gene symbol Hail Query & hailctl	5	1913	August 25, 2022
Another try to extract samples - new approach Help [0.1]	5	1178	May 16, 2017
Support for phased genotypes Help [0.1]	5	879	September 19, 2020
Applying gnomAD Ancestry Methods to other Data Hail Query & hailctl	2	461	August 2, 2021
Gnomad allele frequency query Hail Query & hailctl	11	2765	March 31, 2021

Finding genotype for each (exome locus, sample ID) pair

Related topics