Building CorrMapper was one of the hardest things I’ve ever done. I had no idea how many pieces I will need to fit together to turn my my bioinformatics pipeline into a functioning web-app. I learned a lot from it, and I wanted to save the pain and the steep learning curve for my fellow scientist colleagues, who might not want to spend a full week of their life on trying to get an upload form to work properly. Yeah.. those were fun times..
Here’s the simple idea behind Science Flask:
Notice how everything in blue is non-specific to any scientific app. So then why are we keep re-developing it? Ideally, scientists need not to work on anything else than the green bit. Then they can plug that into Science Flask with a few hours of work and have their tool online in a day or two, instead of weeks.
Here’s an example app built with it, and here’s the GitHub repo with some lengthy docs about the structure of the project. I also wrote a step-by-step guide to deploy your Science Flask app on an AWS.
Science Flask comes batteries included with the following components:
- User management: User’s are only allowed to register with a valid academic email address. This is to ensure that your tool is mainly used for academic and research purposes and not for commercial uses. Furthermore it comes with all the rest of it: email addresses are confirmed, users can change passwords, get password reset request if they forgot it, etc. Thanks Flask-Security, you can also assign roles to different users and easily build custom user management logic. For example you might decide that certain users can only use a part of the application, while other users can access all features.
- SQL database: All user, study and analysis data is stored in an SQLite by defaul. This can be changed to MySQL or Postgre SQL easily and the same code will work, thanks to
SQLAlchemy. Thanks to Flask-Migrate if you change your app’s model, you can easily upgrade your database even when your app is deployed.
- Admin panel: The model (database tables and relations between them) of your app can be easily edited online, from anywhere using CRUD operations. Thanks to Flask-Admin, setting up an admin user who can edit users, and other databases is as simple as modifying 2 lines in the config file.
- Upload form: Getting the data from the user sounds super simple but you’d be surprised how long does it take to get a decent upload page. Also it’s very easy to build complex form logic from the bricks Science- Flask provides.
- Profile page: This collects the uploaded studies of each user and let’s them submit analysis on their data.
- Analysis form: Just like with the upload form, you can build custom logic to ensure you get the parameters from the user just right. The analysis job is then submitted to the backend. This uses
Celery. Once the analysis is ready, the user is notified in email. Then they can download or check out their results online.
- Logging: All errors and warning messages are sent to the admins via email. All analysis exceptions and errors could be catched so that the program crashes gracefully, letting the user know what happened.
- Runs on Bootstrap.css: Modern, mobile friendly, responsive. Bootstrap makes writing good looking HTML pages dead easy.
- Tool tips and tours: Explain to the user how your application works with interactive tours (available on all the above listed pages) and tooltips.
- Python3: The whole project is written in Python3.5 (because it’s 2017).
The example app is very simple it does the following:
- Users can register with an academic email address.
- Upload one or two datasets as .csv or .txt files.
- A series of checks are performed on the uploaded datasets:
- all columns have to be numerical
- each dataset must have a feature and sample number that is between a predefined (see config.py) minium and maximum
- if we have two datasets uploaded by the user, they need a minimum number of intersecting samples.
- missing values are imputed with their column-wise median
- Then the user can submit an analysis and select the number of columns with highest variance, that will be selected from each dataset.
- These features are used to calculate the a correlation matrix between them.
- If there’s only one dataset uploaded, the correlations are calculated between the features of this one dataset. If two datasets are uploaded then three matrices/plots are produced: two for the features of the individual datasets and another that shows the correlation between the features of the two disperate datasets.
- The resulting p-values of the correlation matrix are filtered using one of the user selected corrrection for multiple testing methods: Bonferroni or Benjamini Hochberg. The user can also specify the the alpha-level for hypothesis testing. Only correlations that pass both of these will be displayed.
- The tables and heatmap of correlations can be downloaded by the user or checked online.
I hope it’ll be useful for someone. I definitely would have loved something like this when I started develop CorrMapper. Also if you have an idea to improve it, then please contribute to the project!