Helix Labs | About

Helix has many applications, but current users are those in the clinical, pharmacological and health sectors. Both commercial and academic actors make use of Helix. While Helix predictions are extremely accurate, they are intended for Research Use Only - not for use in diagnostic procedures.

Helix can be used for human missense variant analysis for clinical diagnostics, disease association, and patient stratification. Helix provides in depth missense variant reports as well as structured data.

Helix data can be accessed using our website, our API, or under special circumstances we can supply you with flat file access to all SNPs (Single Nucleotide Polymorphims) in the Helix database.

The website contains in depth variant reports, with literature references and explanations on the prediction made by Helix. Samples can be viewed in our portfolio.
The API allows users to access Helix data in a structured manner, with full access to predictions for every missense variant in the human exome. Bulk methods are available for faster throughput.
The flat file contains an overview of predictions and is suitable for loading into your in-house database. Restrictions apply.

Please contact us if you would like to test Helix on your data. We can assess Helix on your dataset in a secure environment and work with you to assess the results.

Currently Helix only predicts pathogenicity for missense variants only, and the web application, API and flat file only include those. We are currently integrating methods that allow for indel prediction. Contact us if you are interested in this. As for CNVs or non-coding SNPs, this is not something that we are focusing on for now.

Helix scores are presented from 0 to 1, or from 0 to 100%. Helix is optimized for a cutoff of 0.5 (50%), with below 0.5 indicating a benign prediction and above 0.5 indicating a pathogenic prediction. This will yield the best balance between good precision and recall.
For specific purposes one may prefer a focus on either precision or recall. We encourage you to contact us with your use case, we will supply you with an in-depth analysis of the best cutoff for you.

MCC (Matthew's Correlation Coefficient) is a metric that measures the quality of binary predictions. Pathogenicity datasets have binary labels (pathogenic or benign). ROC AUC (Area Under the Curve, specifically the Receiver Operating Characteristic curve) also measures binary prediction performance, but does so for every possible cutoff point.

ROC AUC is a valuable measure if being able to tune the cutoff for prediction is trivial. However, due to the variety of different genes with different properties, we find that it is difficult to do so in practice. We thus assess MCC in order to evaluate the performance of the predictor with the cutoff set to 0.5.

Illustrated below are two prediction landscapes. The X axis corresponds to the residue number of BRCA1, while the Y axis shows the prediction value for Helix (v4.0.1) and REVEL. Red points indicate pathogenic variants, blue points indicate benign variants. Grey points are variants of unknown significance.
In the middle of both plots sits a red line, indicating the 0.5 cutoff. A perfect predictor would place all red points above the line and all blue dots below the line. Box-plots show the distributions of predictions for each category.

Helix gene landscape

REVEL gene landscape

REVEL places the vast majority of points above the red line, predicting almost all as being pathogenic. Helix distributes them more evenly (though not perfectly, BRCA1 is a notoriously difficult gene to predict). This results in a better accuracy and better MCC for Helix.

It is possible to move the cutoff for REVEL in this case to produce a better MCC or accuracy score. This is roughly what ROC AUC measures. While this is possible for BRCA1, which is well studied, most genes are not as well characterized. Less than 10% of all human genes are associated with more than 10 known variants, most of these are benign. We deem adjusting cutoffs based on sparse data an error prone exercise. In order to address this, we elect to evaluate predictors based on their recommended cutoffs using MCC.

Outside Helix's superior performance, using Helix has a number of advantages over other predictors.

Helix is available for every variant in the Human exome.
Helix provides in-depth variant reports with access to literature for every variant.
Helix is consistently updated, in contrast to other predictors that are frequently abandoned once the research project is finished.
You can always contact the Helix team for information and requests related to predictions - and expect a timely response.
Helix predictions are constantly improving, as novel techniques are incorporated into our predictor.

Yes, please get in touch to discuss options.

Frequently Asked Questions

Who can use Helix?

What is Helix used for?

How do I access Helix data?

Can I test Helix?

Does Helix provide predictions for InDels, CNVs or non-coding variants?

What is the best cutoff to use?

Why do you use MCC in your analyses instead of ROC AUC?

Helix gene landscape

REVEL gene landscape

How does Helix compare to a predictor like REVEL?

Are Helix predictions available for other organisms?

Is the data that Helix predictions are built on available as a separate resource that can be used to develop other applications or integrate with in-house pipelines?