Web API projects

lasuperclasse · December 9, 2018, 2:36am

Hi all,
Do you know of any projects developing a web API layer for Hail? Open GitHub projects preferred of course

Alex_Kotlar · December 10, 2018, 2:14am

Yep! It’s in progress, and my focus. If you have any specific feature requests pertaining to the app, I’m very interested!

Alex

lasuperclasse · December 17, 2018, 5:20am

Hi Alex,
I absolutely have some input. Would you like me to share on this forum or would you prefer that I message you directly?

Best,
Daniel

tpoterba · December 17, 2018, 1:23pm

We’d like to keep everything public if possible - the whole community benefits that way!

Alex_Kotlar · December 17, 2018, 6:16pm

Yes, as Tim suggests please share with everyone. Maybe to help organization, tag #web

Alex_Kotlar · December 17, 2018, 6:19pm

Tim, it seems I can’t make categories yet. Could we make a category for web efforts? (#web or something similar). It seems distinct enough from other aspects of Hail development to maybe warrant separation.

tpoterba · December 17, 2018, 7:45pm

Alex, I think I gave you privileges – can you try again?

tpoterba · December 17, 2018, 7:45pm

if it’s a development chat, we may want to move it to http://dev.hail.is though

lasuperclasse · December 18, 2018, 6:34am

Well here goes then

the import / export data functions are the most important to me; there is obviously a need to import metadata as well. Not sure if best format for latter is TSV, CSV or JSON.
– import of VCF and export in PLINK is most important. Capacity to import Nirvana-annotated JSON files would be awesome.
the second most important function for me once data is in system is the capacity to generate PCA plots
then I would like to see filtering functions, based on metadata or specific criteria such as the ones in the “Let’s do a GWAS” tutorial/example
These would be the basic functionalities would love to see in web API layer.

Thank you so much for hearing me out!
Daniel

tpoterba · December 18, 2018, 2:26pm

Thanks for the post! Can you share a bit more about the users you’re envisioning for this kind of system? It sounds like this could make it possible for people without as much programming experience to analyze big genetic data.

lasuperclasse · December 19, 2018, 11:50pm

Hi Tim and Alex,
I’m not certain if that really is the intent or if I really understood the project that Alex is working on.

To be clear, I’m just looking for REST API endpoints to help perform the tasks I listed - not a web interface for doing so…

So, out of the box, there will still be a need for programming skills to make the API’s useful to the non-programmer.

Does that make sense?

best,

daniel

hail-team · December 20, 2018, 12:06am

Something we are actively working on (but somewhat separately from Alex’s current pull request) is totally abstracting Hail’s backend and front end. The backend will be a system that executes serialized json-like queries and returns data to whatever front-end is contacting it.

We’ll be setting up something like this for Broad internal use as a proof-of-concept in the next year, and it would certainly be possible for you to use a similar system when that’s ready.

The system you’re imagining is that your organization would stand up a running cluster behind a web API, and you’d build various systems against that?

lasuperclasse · December 20, 2018, 6:35am

Yes - that’s exactly it; I am looking to be able to run Hail as a headless app and access its querying and reporting functionalities through API’s. The graphical front-end used to structure the API calls should be irrelevant.
If I too am looking to build a proof of concept (read lite functionality, only need those described in previous message) of this type of system, what sort of effort do you think will be required? Also, do you think the current version of hail is mature enough to support this kind of use case?

Many thanks again for your expert advice!
Daniel

lasuperclasse · December 21, 2018, 12:59am

Hi guys,
Any thoughts on the hours of effort required to build out API layer? It would be really helpful for me to have this info before end of quarter, 2018…

Cheers

hail-team · December 21, 2018, 1:29am

Separating our frontend and backend entirely will probably be done by end of Q1 2019. But you could set up a flask server running Hail on a tiny dataset in a couple of hours, I’d think…

I still don’t have a great grasp on the set of things you want to use this web API for. Maintaining a running cluster for just a few users would be expensive; there are economies of scale here.

danking · December 21, 2018, 5:21pm

@lasuperclasse, I’m also curious about your use-case. Would your users be happy with a Jupyter Hub Notebook that was hosted by the leader node of Spark? They could share a single cluster. They don’t need anything but a web browser.

It seems like your customers / users are programmers / software services? I think you could write a simple Flask app that wrapped calls to Hail’s python API, but serializing large VCFs and output CSVs seems not great?

A small Flask app that loaded a known dataset, ran PCA, and returned the image would be pretty simple:

import hail as hl
from flask import Flask
app = Flask()

datasets = {'foo': '/datasets/foo.mt', ...}

@app.route('/pca/{dataset}')
def pca(dataset):
    fname = datasets.get(dataset,None)
    if fname is None:
        return 'no such dataset', 404
    eigenvalues, scores, loadings = hl.hwe_normalized_pca(hl.read_matrix_table(fname).GT, k=5)
    return ???

I don’t know exactly what you want to return, but that’s like 7 lines to a PCA service.

lasuperclasse · December 22, 2018, 1:35am

Thanks for getting back to me.
I don’t understand why the data set needs to be tiny if running Flask to serve up some results.
By tiny you mean <100k samples? or more like <100 samples? Isn’t the handicap of using flask more relevant to the number of requests served rather than the

I agree with the economies of scale coming into play only when you have a sufficient number of users making the maintenance worth while…

tpoterba · December 22, 2018, 2:06am

I don’t understand why the data set needs to be tiny if running Flask to serve up some results.

Do you intend that after receiving a request, the server will run Hail on some data? Or just serve some set of static precomputed results?

danking · December 22, 2018, 5:23am

You mentioned earlier:

I imagine that it is possible to transfer 100,000 whole genomes through Flask and into a networked file system. You can then use your Hail cluster to manipulate that file. However, there are almost certainly better ways to transfer 100,000 whole genomes from the client machine to a networked file system that the Hail cluster can access.

I don’t know much about your environment or use-case, so it is hard to make recommendations. Can you describe a typical user interaction with this service? Do they start by uploading their data to the service or are all the datasets already present in the service? How does a user identify/refer-to a dataset? How does a user describe a variant filter, do they receive an identifier that refers to the filtered dataset, or does every operation take filtering arguments?

lasuperclasse · December 24, 2018, 1:43am

It would be more along the lines of option 1;
There would be a regular addition of data to the database (say daily or weekly), and various analysis requests would be performed ad-hoc;
There would be very little static pre-computed results, other than maybe some meta-information or database sizes since last update that would be static.

Topic		Replies	Views
List of Various Beginner Questions Hail Query & hailctl	1	616	November 18, 2018
Big picture issues: considering switching to HAIL Meta	6	3446	January 3, 2023
Looking for a working script for annotation Hail Query & hailctl	0	373	October 30, 2018
0.3 Hail release Development	4	824	December 30, 2019
Hail Docker container? Development	1	357	May 18, 2023

Web API projects

Related Topics