Wishing you first and foremost a merry Christmas and happy and healthy New Year.
By “that file”, (this is for my information), you mean that all the data is stored within a single massive file?
Hopefully these answers are sufficient:
Can you describe a typical user interaction with this service? A user would only be able to do certain basic analysis steps (such as the ones in “Let’s do a GWAS”) via a web-GUI. Hence the utility of having a REST API layer sit between Hail and a UI.
Do they start by uploading their data to the service or are all the datasets already present in the service? No, the data available to the user is what is present in the database at moment of query. that being said, fresh data would be added regularly.
How does a user identify/refer-to a dataset? The user would be querying the totality of the dataset that fits the filtering criteria.
How does a user describe a variant filter, do they receive an identifier that refers to the filtered dataset, or does every operation take filtering arguments? Not certain I really understand the question but I’m leaning towards the latter where a series of filters are applied to the dataset until they reach a number of subjects with acceptable phenotypes and allelic frequencies…
Again, I really appreciate these in-depth questions and your time for responding to it all!