Hi, I’m new to Hail. I would like to conduct GWAS (linear regression), however, the genotype data is stored in a HDF5 file. In the the HDF5 file, I have every sample’s allele count at each locus. I think it should be straight forward to use these allele counts to run GWAS in Hail. But I’m not sure:
- How can I read HDF5 into Hail? Using
h5pypython package, I can read the
.hdf5file into python (
h5py.File(xx.hdf5,'r')), but how could I pass the allele counts in this HDF5 file to Hail?
- Suppose it is possible to pass the data to Hail, how to conduct GWAS using allele counts (which are 0, 1, 2)? I’ll also need to compute PCA, can I do that in Hail using allele counts? How to include other covariates stored in a separate file?
Here is one HDF5 data that you can check:
Thank you very much for your help!