Hi, I’m new to Hail. I would like to conduct GWAS (linear regression), however, the genotype data is stored in a HDF5 file. In the the HDF5 file, I have every sample’s allele count at each locus. I think it should be straight forward to use these allele counts to run GWAS in Hail. But I’m not sure:
- How can I read HDF5 into Hail? Using
h5py python package, I can read the
.hdf5 file into python (
h5py.File(xx.hdf5,'r')), but how could I pass the allele counts in this HDF5 file to Hail?
- Suppose it is possible to pass the data to Hail, how to conduct GWAS using allele counts (which are 0, 1, 2)? I’ll also need to compute PCA, can I do that in Hail using allele counts? How to include other covariates stored in a separate file?
Here is one HDF5 data that you can check:
Thank you very much for your help!
Hi Liverpool! Thank you for your interest in Hail!
- Yes, you should be able to import HDFS files into Hail which will then be formatted into a Hail matrix table for ease in computation https://hail.is/docs/0.2/utils/index.html#hail.utils.hadoop_open
- As for a linear regression of genotype or allele counts, I would highly suggest looking through our GWAS tutorial https://hail.is/docs/0.2/tutorials/01-genome-wide-association-study.html
If you would like a video tutorial : https://www.youtube.com/playlist?list=PLlMMtlgw6qNg7im-zHSWu7M1N8xigpv4m
Kumar, they are not talking about “HDFS”. It’s a 5, not an S.
After talking to John offline, I made an error and you would have to attempt to read the HDF5 file into a format that we will be able to import the file into the Hail environment e.g. a vcf file.
An HDF5 file importer is something we should add in the future, but unfortunately we haven’t yet.
Thanks for your reply!
I could use
h5py package to read the hdf5 file into python, and extract the genotype data as a numpy matrix, so could I somehow convert it to Hail matrix table or other format that Hail could understand?
It’s not super easy to go directly from numpy => hail – it may be easier to go through a text intermediate and use
Thank you @tpoterba! I will try that.