Frequently Asked Questions

Helix has many applications, but current users are those in the clinical, pharmacological and health sectors. Both commercial and academic actors make use of Helix.
Helix can be used for human missense variant analysis for clinical diagnostics, disease association, and patient stratification. Helix provides in depth missense variant reports as well as structured data.
Helix data can be accessed using our website, our API, or under special circumstances we can supply you with flat file access to all SNPs (Single Nucleotide Polymorphims) in the Helix database.
  • The website contains in depth variant reports, with literature references and explanations on the prediction made by Helix. Samples can be viewed in our portfolio.
  • The API allows users to access Helix data in a structured manner, with full access to predictions for every missense variant in the human exome. Bulk methods are available for faster throughput.
  • Under special circumstances we supply a flat file to our clients containing details on all SNPs in the Helix database for bulk access.
You can test Helix on your data using our Comparison tool. This will allow you insight in Helix' performance on your problem, compared to other pathogenicity predictors. We do not save any data submitted to the comparison tool.
Currently Helix only predicts pathogenicity for missense variants only, and the web application, API and flat file only include those. We are currently integrating methods that allow for indel prediction. Contact us if you are interested in this. As for CNVs or non-coding SNPs, this is not something that we are focusing on for now.
Helix scores are presented from 0 to 1, or from 0 to 100%. Helix is optimized for a cutoff of 0.5 (50%), with below 0.5 indicating a benign prediction and above 0.5 indicating a pathogenic prediction. This will yield the best balance between good precision and recall.
For specific purposes one may prefer a focus on either precision or recall. We encourage you to contact us with your use case, we will supply you with an in-depth analysis of the best cutoff for you.

MCC (Matthew's Correlation Coefficient) is a metric that measures the quality of binary predictions. Pathogenicity datasets have binary labels (pathogenic or benign). ROC AUC (Area Under the Curve, specifically the Receiver Operating Characteristic curve) also measures binary prediction performance, but does so for every possible cutoff point.

ROC AUC is a valuable measure if being able to tune the cutoff for prediction is trivial. However, due to the variety of different genes with different properties, we find that it is difficult to do so in practice. We thus assess MCC in order to evaluate the performance of the predictor with the cutoff set to 0.5.

Illustrated below are two prediction landscapes. The X axis corresponds to the residue number of BRCA1, while the Y axis shows the prediction value for Helix (v4.0.1) and REVEL. Red points indicate pathogenic variants, blue points indicate benign variants. Grey points are variants of unknown significance.
In the middle of both plots sits a red line, indicating the 0.5 cutoff. A perfect predictor would place all red points above the line and all blue dots below the line. Box-plots show the distributions of predictions for each category.

Helix gene landscape

Helix gene landscape for BRCA1.

REVEL gene landscape

REVEL gene landscape for BRCA1.

REVEL places the vast majority of points above the red line, predicting almost all as being pathogenic. Helix distributes them more evenly (though not perfectly, BRCA1 is a notoriously difficult gene to predict). This results in a better accuracy and better MCC for Helix.

It is possible to move the cutoff for REVEL in this case to produce a better MCC or accuracy score. This is roughly what ROC AUC measures. While this is possible for BRCA1, which is well studied, most genes are not as well characterized. Less than 10% of all human genes are associated with more than 10 known variants, most of these are benign. We deem adjusting cutoffs based on sparse data an error prone exercise. In order to address this, we elect to evaluate predictors based on their recommended cutoffs using MCC.

Outside Helix's superior performance, using Helix has a number of advantages over other predictors.
  • Helix is available for every variant in the Human exome.
  • Helix provides in-depth variant reports with access to literature for every variant.
  • Helix is consistently updated, in contrast to other predictors that are frequently abandoned once the research project is finished.
  • You can always contact the Helix team for information and requests related to predictions - and expect a timely response.
  • Helix predictions are constantly improving, as novel techniques are incorporated into our predictor.
Yes, please get in touch to discuss options.
Yes, please get in touch to discuss options.

Is your question not listed? Don't hestitate to contact us!