P-value "bands" in linear regression

Hi folks—I am looking for ideas on a curious effect I’m seeing after running a linear regression in Hail. To be clear, I’m not sure whether this is due to something inherent in the data, linear regression, or possibly how regression is implemented in Hail. I used GT.n_alt_alleles() as the predictor variable (among other covariates) and serum phosphate level as the response variable. I see “bands” of p-values in the volcano plot:


And the distributions of betas and p-values looks, I think, as they should:



Any ideas what could be going on?


My hypothesis is that these bands correspond to loci where a single individual is non-reference (the second band 2 non-reference, etc). What’s the full call to the method in Hail, and how many samples do you have? If you’re using other covariates, that would contradict my idea.

I think digging into the genotype + covariate configuration at a couple points on each line will reveal what’s going on.

My hypothesis is that it might be from LD, but I don’t know how we can prove it. Also, the bands are mostly from non-significant sites.