We use some essential cookies to make this website work. These cookies are used to improve your website experience. To find out more about the cookies we use, see our Privacy Policy.
We’d also like to use analytics cookies so we can understand how you use the service and make improvements.
Manage cookies
Use these toggles to turn accept or reject individual cookies. To find out more about the cookies we use, see our Privacy Policy.
Strictly Necessary Cookies
Disabliing these cookies may degrade your web experience.
Functionality Cookies
Targeting/Advertising Cookies
Technology
From cosmology to oncology
Studying dark matter
Dark matter is a property of the entire universe, yet it cannot be measured directly.
To overcome this, astrophysicists infer the unmeasurable properties of dark matter from individual pictures of galaxies.
These pictures are effectively combined into a single unified dataset to decipher dark matter’s general properties.
Oncology application
Concr teamed up with astrophysicists to adapt these established algorithms for genuine integration of disparate oncology datasets, creating a single holistic model of patient response.
By overcoming the data integration barrier, Concr not only enables powerful analytics, but also provides the critical advantage when working with limited or incomplete data.
Case study
Accurate prediction of in vitro drug response with 300x less data
Key takeaways
Isolated high-confidence results by accurately predicting molecular features most representative of efficacy for therapeutics
300x more data-efficient: 2-3 cell lines were sufficient to achieve same RMSE, compared to 600 cell lines using other methods
Ability to generalise to novel drugs and cell lines across therapeutic classes and indications
Authors
Matthew Griffiths, Eilish Middlehurst, Matthew Foster
Background and aim
Data availability and quality are often subpar for meaningful and accurate analysis to be performed.
Here we address this challenge using data from the Genomics of Drug Sensitivity in Cancer1 to predict IC50 for specified drug cell-line pairing.
Data & Modelling
Data input:
810 cell lines (WXS and RNAseq)
175 compounds (SMILES)
118,595 dose response curves (IC50)
Modelling:
Infer cell phenotype
For each drug the dataset was split 80:20 into a training:validation dataset with an even ratio of sensitive/resistant cell-lines for each drug. All response data for Olaparib and Niraparib was excluded from training.
Concr model was trained on the validation set to predict IC50 values and using the SMILES molecular description and the dose-response data to predict efficacy of Niraparib and Olaparib.
Results
Predicted vs observed IC50 are comparable. The model is able to generate its own uncertainty, so it is possible to extract most confident predictions. Concr model had a RMSE* of 0.46, comparable to the best possible RMSE of 0.4.
Concr model was given the dose-response data for 2 randomly selected cell-lines for Olaparib and 3 for Niraparib. The model made accurate prediction of the IC50 for the unseen 800 cell-lines (RMSE = 0.51, 0.53, respectively).
Accuracy was comparable to that achieved by state of the art methods2,3 with 700 cell-lines (RMSE = 0.45)
References
1Iorio, F., et al. (2016). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3),740–754
2Chang, Y., et al. (2018).Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature. Scientific Reports, 8(1), 1–11.
3Rahman, R., et al. (2017). Heterogeneity aware random forest for drug sensitivity prediction. Scientific Reports, 7(1), 1–11.
*RMSE = Root-mean-square deviation
Case study
Concr modelling is generalisable across drugs and cancer types
Key takeaways:
Concr modelling can be generalised across to drugs it's 'blind' to
Concr modelling shows early evidence of being generalisable across cancer types
Background and aim
We have previously demonstrated accurate response predictions using the Genomics of Drug Sensitivity in Cancer1 (GDSC) comprehensive dataset. However most drugs do not have a database of 800+ cell lines with their associated IC50. Hence we set out to assess generalisability of our modelling across cancer types, and to drugs the models are ‘blind’ to.
Data & Modelling
Data input:
810 cell lines (WXS and RNAseq)
175 compounds (SMILES)
118,595 dose response curves (IC50)
Precision Panc cell line data
Modelling:
Concr model was trained on GDSC data to predict IC50 values, and using the SMILES molecular description and the dose-response data to predict drug efficacy. The model was applied directly to the Precision Panc cell line dataset to test predictive accuracy.
For drug generalisability, the process of progressively performing cell line viability experiments was simulated and the process was repeated 30 times to provide aggregate statistics. In this study no structural information about the therapy was provided.
Results
The overall predictions of the GDSC model have r2 score=0.414. Error bars=uncertainty in the prediction; color of the points=cell line; shape=drug used.
After <20 iterations the model accurately chose cell lines with IC50 values less than the targeted cutoff (left panel). After 50 iterations majority of the most sensitive cell lines have been selected, and after 100 nearly all of them have (right panel).
References
1Iorio, F., et al. (2016). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3),740–754
Case study
Identifying and validating breast cancer biomarkers for cohort stratification
Key takeaways:
Superior predictive accuracy compared to other methods
7x less patient data required for model training compared to the next best approach
First of its kind: disease-free survival stratification
Authors
Matthew Griffiths, Uzma Asghar, Matthew Foster
Background and aim
Better patient stratification in early-phase trials through more effective biomarkers would not only improve efficacy and faster drug approval, but also reduce length and cost of trials, ultimately passing the savings on to healthcare providers.
In this study we used Concr advanced statistical modelling to predict cell profiles of responding breast cancer patients and their overall and disease-free survival, segmented into risk groups.
Data & Modelling
Data input - TCGA:
1098 breast cancer patients
Therapy used
Outcome (OS, DFS)
Tumour data (WGS, RNASeq, Illumina 450k Methylation)
Modelling:
Concr hierarchical bayesian multi-omic model to identify risk profiles was created excluding 500 patients who received alkylating therapy.
The dataset was split into 5 random cohorts: 1 was used as validation, and 4 were used to train the OS and DFS models, repeated 5 times.
Results
Figure 1. Concr custom Bayesian cell admixture model identified recurring multi-omic cell profiles in the TCGA cohort excluding patients who received alkylating therapy. These cell profiles could then be used to infer the subpopulation breakdown of the patients tumours who had received alkylating agents.
Figure 2. Using k-fold validation, risk profiles were associated with the cell profiles identified by the model, segmenting patients by DFS and OS (who received alkylating therapy). Results shown are aggregated predictions of five independent validations. AUC accuracy = 0.88, superior to other methods1 and using 7x less patients than next best approach.
Figure 3. Our models identified specific recurring subtypes of tumour cells commonly seen across the cohort, stratified by the genetic, transcriptomic and methylation status into subtypes 1-11. A ‘risk’ score was calculated for each therapy, with significant variation observed across the treatments and subtypes (whiskers = upper / lower quartiles of the confidence interval).
References
1Dubourg-Felonneau, et al. (2018). A Framework for Implementing Machine Learning on Omics Data. 1–5.
TCGA - The Cancer Genome Atlas OS - overall survival DFS - disease-free survival
Our partners leverage Concr advanced predictive modelling across every stage of therapeutic development to create shared value for the benefit of cancer patients.