Hi hail team,
I have a very big list of phenotypes in my matrixtable and want to run linear regression in a subset of these at a time.
I cannot seem to be able to define these subsets by by the name of the phenotype in the matrixtable when defining the y list of phenotypes. Any help please towards the right direction?
mt.phenotype.describe()
--------------------------------------------------------
Type:
struct {
...
fbc_neut_p: float64,
fbc_eo_p: float64,
fbc_mono_p: float64,
fbc_lymph_p: float64,
fbc_baso_p: float64,
fbc_ret_p: float64,
fbc_hlr_p: float64,
fbc_hct: float64,
fbc_pct: float64,
fbc_hgb: float64,
fbc_rbc: float64,
fbc_wbc: float64,
fbc_mpv: float64,
fbc_plt: float64,
fbc_rdw: float64,
fbc_pdw: float64,
fbc_mcv: float64,
fbc_mch: float64,
fbc_mchc: float64, ...
This works:
phens=[mt.phenotype[i] for i in range(10,100)]
gwas = hl.linear_regression_rows(
y=phens,
x=mt.GT.n_alt_alleles(), covariates=[1.0, pcas1[0:3][0]], pass_through=[mt.rsid])
for i in range(0, 100):
print(f"Plotting {i}:{phens[i]}")
p = hl.plot.manhattan(gwas.p_value[i], title=f"Interval WGS GWAS Manhattan Plot: {phens[i]}")
output_file(f"{i}.WGS-manhattan-{phens[i]}.html")
save(p)
#p.show()
What I am trying to do is select the phenotypes based on their name:
for pheno in p:
if pheno.startswith('fbc'):
fbc.append(pheno)
fbc1=mt.phenotype.select(*fbc)
#THIS does not work:
gwas = hl.linear_regression_rows(
y=[fbc1],
x=mt.GT.n_alt_alleles(), covariates=[1.0, pcas1[0:10][0]], pass_through=[mt.rsid])
I get this error:
TypeError: linear_regression_rows: parameter 'y': expected (expression of type float64
or Sequence[expression of type float64] or Sequence[Sequence[expression of type float64]]),
found list:
['fbc_neut_p', 'fbc_eo_p', 'fbc_mono_p', 'fbc_lymph_p', 'fbc_baso_p', 'fbc_ret_p', 'fbc_hlr_p', 'fbc_hct', 'fbc_pct', 'fbc_hgb', 'fbc_rbc', 'fbc_wbc', 'fbc_mpv', 'fbc_plt', 'fbc_rdw', 'fbc_pdw', 'fbc_mcv', 'fbc_mch', 'fbc_mchc', 'fbc_ret', 'fbc_hlr', 'fbc_neut', 'fbc_mono', 'fbc_baso', 'fbc_eo', 'fbc_lymph', 'fbc_irf', 'fbc_myeloid_wbc', 'fbc_gran', 'fbc_eo_baso_sum', 'fbc_neut_eo_sum', 'fbc_baso_neut_sum', 'fbc_gran_p_myeloid_wbc', 'fbc_eo_p_gran', 'fbc_neut_p_gran', 'fbc_baso_p_gran']
I want to use this fbc list to select the phenotypes from the matrixtable and use them as the y variable instead of explicitly defining them by typing them one by one. That will let me print their names in my plots too.
Any help please? Thank you