I want to remove a snp from the regression equation

aleriedy · February 27, 2019, 7:43pm

Hi
I want to remove a specific snp or snps from the regression equation and add it as a covarite

Thank you

tpoterba · February 27, 2019, 8:47pm

Here’s code that will take SNP IDs and run regression using them as a covariate:


my_snps = ['1:1:A:T', '16:29102912:T:TTT']

mt_filt = mt.filter_rows(mt_snps_lit.contains(mt.row_key))
mt_filt = mt.annotate_globals(
    snps = hl.set([hl.parse_variant(x) for x in my_snps]))
mt_filt = mt_filt.filter_rows(mt_filt.snps.contains(mt_filt.row_key))

sample_genos = mt_filt.annotate_cols(
    genotypes = hl.agg.collect(mt_filt.GT.n_alt_alleles()))
mt = mt.annotate_cols(
    snp_covs = sample_genos.cols()[mt.x].genotypes
)

gwas = hl.linear_regression_rows(
    mt.GT.n_alt_alleles(),
    mt.pheno,
    covariates=[1, mt.age, mt.PC1, mt.PC2, *(mt.snp_covs[i] for i in range(len(my_snps)))])

aleriedy · February 27, 2019, 10:05pm

Thank you so much

hhx037 · February 28, 2019, 4:59pm

@tpoterba

one follow-up question, if one wants to use GP instead of GT, all there is to do is change sample_genos like this:

sample_genos = mt_filt.annotate_cols(genotypes = hl.agg.collect(hl.gp_dosage(mt_filt.GP)))

Is that correct?

tpoterba · February 28, 2019, 6:06pm

yup, exactly.

Also note that the second line (mt_snps_lit = hl.literal([hl.parse_variant(x) for x in my_snps])) doesn’t actually work due to a Python limitation. I’ll think about a fix

tpoterba · February 28, 2019, 6:09pm

OK, edited. should work now

hhx037 · March 1, 2019, 10:00am

Great, thank you.

hhx037 · March 1, 2019, 10:15am

Sorry, one more question. In the UK Biobank, X and XY genotypes are given only for a subset of the samples, so hl.MatrixTable.union_rows doesn’t work. That’s fine, X and XY can be analysed separately, but what to do if one needs to use the genotype of a variant located on an autosome as a covariate?
Will

mt = mt.annotate_cols(snp_covs = sample_genos.cols()[mt.x].genotypes)

applied on X or XY only annotate the samples in common or will one run into trouble?

tpoterba · March 1, 2019, 3:26pm

if you’ve got autosomes, x, and xy as separate matrix tables, and you have snps on each that you want to use as covariates, then you should build one sample_genos for each, and annotate each matrix table with all 3 results before regression, I think.

Topic		Replies	Views
Extract a subset of SNP towards per individuals Help [0.1]	4	623	July 9, 2018
Gwas analyis on whole exome data without controls Science	12	740	May 30, 2020
Annotating samples with a specific genotype dosage Help [0.1]	7	974	November 17, 2017
SNP dosages to numpy/pandas? Hail Query & hailctl	23	1659	September 5, 2022
Prune variants based on distance threshold Hail Query & hailctl	2	363	May 5, 2022

I want to remove a snp from the regression equation

Related topics