Hi!
I’d like to ask for a recommendation as a not a very experienced developer. I apologize if this is not the right place for this:
Having a hypothetical smallish VCF and a dataframe-like data about phenotypes stored in Postgres (ca. 5k people x (WES + 1k phenotypes)), I would like to develop a dashboard-like web app with simple, nice-looking interface. The purpose of the app would be to enable simple exploration of the data after selecting/filtering/grouping/aggregating both vertically and horizontally and presenting the result in the forms of an explicit table, barplots, histograms etc., with as small a time-delay as possible. Example of queries would be: 1) total number of deletions in chrX of a particular individual, 2) list of all variants from a predefined list that a given individual has, 3) mean number of individuals with specific genotype in a specific locus. I imagine some aggregate statistic could be precomputed, but most queries would need to be computed ad hoc. Additionally performing GWAS-like analysiss (hypothesis testing, dimensionality reduction, some general ML would be nice) should be possible on the backend - not necessarily in real time, and their results would be presented in the app as well. The data does not change other than being incrementally updated once a week. The app would serve the results to anyone with Internet access (and appropriate credentials).
Hail seems like a great tool for importing and manipulating the genotypic data, doing gwas, exporting the data to spark. On the frontend, something like python library Dash seems appropriate.
Given the above lengthy confession, here’s my questions:
Do You have any recommendations? Is this even doable? What technologies should I use? Should I put the vcf in a database? If I understand correctly, Hail’s native data format can also be used to persist the data on disk and query it in a lazy manner? Additionally, I’ve been instructed to look into Parquett and Cassandra. Any suggestions or reactions from the Hail community will be appreciated!