I am trying to calculate polygenic risk scores (PRS/PGS) for UK Biobank data (imputed bgen files) using pre-computed scores from the PGS Catalog available online.
So basically I’m just trying to code a script that takes as input a file with the pre-computed scores from PGS catalog, as for example, this file: https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/Harmonized/PGS000001_hmPOS_GRCh37.txt.gz ; and bgen files (one per chromosome) from UKB. Then the script will map the variants present in PGS Catalog with those present in UKB, and the beta values (coming from the PGS Catalog file) for each variant for each individual will be summed up to calculate the PGS per individual.
Here is a bgenix and plink version of what I am considering doing but using Hail instead: PRS Pipeline
Is there something similar already available?
Do you think it is sensible to use Hail for this use-case or should I keep with bgenix or plink? As I hope I explained, I am not trying to compute develop or validate PRS scores, but just do a “mapping of the variants” and use the scores already computed in the PGS Catalog.
Thank you in advance!