Finding genotype for each (exome locus, sample ID) pair


#1

I cannot find the 2-dimensional table that gives the genotype for each (exome locus, sample ID) pair. Is it in gnomad.exomes.r2.1.sites.ht somewhere? I have downloaded that file and can read it into hail with hl.read_table, but I am not seeing what I want within there. Maybe I am failing to understand some of the field names, or maybe I am looking in the wrong place … help please!

Once I’ve found it, I’d like to export it (or parts of it) into a format that can be read into R, such as a TSV file. I am thinking that a write(’/mypath/myfile.tsv.bgz’) will do it, but if I am wrong, please let me know.


#2

gnomAD can’t release individual-level genotype data, unfortunately. Only summary statistics about sites.


#3

Thank you for the quick reply! Would these data be available from gnomAD via a dbGaP application or similar?

Regardless, to what extent are there statistics available from gnomAD that would quantify linkage disequilibrium between specified loci, in the total population of samples or within any of the subpopulations? My particular interest is in the exomic loci. Thanks!


#4

Would these data be available from gnomAD via a dbGaP application or similar

not that I know of.

I’m not sure about LD statistics either, sorry!


#5

Ah, I get it (I hope)! The web pages that describe manipulating SampleID-by-locus matrix data are not describing the gnomAD data set, they are describing Hail functionality. If my personal data set had that level of data then I could use Hail to manipulate it, but gnomAD doesn’t have that level of data!

No need to respond, unless it is apparent that I am still hopelessly confused, and you are kind enough to lead me!


#6

I’m not sure exactly which pages you’re referring to, but most of the public gnomAD code was code used to generate the public summary statistics from the genotype-level (sample-by-variant matrix) data.

You are totally correct that you can’t go and reproduce their analysis because the genotype data is private, but you could apply those scripts (with some adjustments, probably) to your own dataset.