I’m trying to run a phewas with hundreds of phenotypes. My plan is to use linear_regression_rows. Some of the phenotypes have many missing values; others do not. Thus, I want my outcome variable to be a list of lists, not a single list, so that each phenotype is treated independently. If I had three phenotypes, I’d probably write the code as follows:
With hundreds of phenotypes, however, it is difficult to manually list them all out. How can I efficiently create a list of lists that I can use as the y parameter in linear_regression_rows?
First, let me clarify that if two phenotypes appear in the same inner list, e.g. A and B in [[A, B], [C]], they are treated as independent regressions (the same as if we had written [[A],[B],[C]]), but are computed in a more computationally efficient way. The only requirement for appearing in the same inner list is a shared missingness pattern. Essentially, this expression should evaluate to true:
I dont know i can buzz in or not but when u run linear regression one at a time. it will consider missingness for each phenotype seperately. say if a phenotype have n values out of m samples, while running regression it will filter samples which doesnt have values in phenotypes and do regression based on n values. Thats what i observer for my analysis. How many penotypes are you running on? @Mark . ps: sorry if i buzz with wrong answer. U can use R to get all missing values for n number of phenotypes columns.
phens = [[mt.pheno] for x in mt.pheno]
phewas = hl.linear_regression_rows(y=phens, x=hl.gp_dosage(mt.GP),covariates=[1.0,mt.covariates.age,mt.covariates.sex])
Unfortunately, I got the following error:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1159, in send_command
raise Py4JNetworkError(“Answer from Java side is empty”)
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 985, in send_command
response = connection.send_command(command)
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1164, in send_command
“Error while receiving”, e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
It also said:
Py4JError: An error occurred while calling o102.pyPersist.
Any idea why this might be happening?
I don’t know if it’s relevant, but all my column names in the phenotype matrix are just numbers. Could that be causing a problem?
Sorry for the delay, I had some bad notifications settings.
This usually indicates that you don’t have BLAS libraries appropriately installed and configured. If you’re on Ubuntu or Debian, you might try: apt-get install libopenblas liblapack3.