Technology

From cosmology to oncology

Studying dark matter

Dark matter is a property of the entire universe, yet it cannot be measured directly.

To overcome this, astrophysicists infer the unmeasurable properties of dark matter from individual pictures of galaxies.

These pictures are effectively combined into a single unified dataset to decipher dark matter’s general properties.

Oncology application

Concr teamed up with astrophysicists to adapt these established algorithms for genuine integration of disparate oncology datasets, creating a single holistic model of patient response.

By overcoming the data integration barrier, Concr not only enables powerful analytics, but also provides the critical advantage when working with limited or incomplete data.

Case study

Accurate prediction of in vitro drug response with 300x less data

Key takeaways


Isolated high-confidence results
by accurately predicting molecular features most representative of efficacy for therapeutics

300x more data-efficient: 2-3 cell lines were sufficient to achieve same RMSE, compared to 600 cell lines using other methods

Ability to generalise to novel drugs and cell lines across therapeutic classes and indications

Authors

Matthew Griffiths, Eilish Middlehurst, Matthew Foster

Background and aim

Data availability and quality are often subpar for meaningful and accurate analysis to be performed.

Here we address this challenge using data from the Genomics of Drug Sensitivity in Cancer1 to predict IC50 for specified drug cell-line pairing.

Data & Modelling

Data input:

  • 810 cell lines (WXS and RNAseq)
  • 175 compounds (SMILES)
  • 118,595 dose response curves (IC50)

Modelling:

Illustration 1

Infer cell phenotype

For each drug the dataset was split 80:20 into a training:validation dataset with an even ratio of sensitive/resistant cell-lines for each drug. All response data for Olaparib and Niraparib was excluded from training.

Concr model was trained on the validation set to predict IC50 values and using the SMILES molecular description and the dose-response data to predict efficacy of Niraparib and Olaparib.

Results

Predicted vs observed IC50 are comparable. The model is able to generate its own uncertainty, so it is possible to extract most confident predictions. Concr model had a RMSE* of 0.46, comparable to the best possible RMSE of 0.4.

Graph 1

Concr model was given the dose-response data for 2 randomly selected cell-lines for Olaparib and 3 for Niraparib. The model made accurate prediction of the IC50 for the unseen 800 cell-lines (RMSE = 0.51, 0.53, respectively).

Graph 2

Accuracy was comparable to that achieved by state of the art methods2,3 with 700 cell-lines (RMSE = 0.45)

References

1Iorio, F., et al. (2016). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3),740–754

2Chang, Y., et al. (2018).Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature. Scientific Reports, 8(1), 1–11.

3Rahman, R., et al. (2017). Heterogeneity aware random forest for drug sensitivity prediction. Scientific Reports, 7(1), 1–11.

*RMSE = Root-mean-square deviation

Case study

Concr modelling is generalisable across drugs and cancer types

Key takeaways:

Concr modelling can be generalised across to drugs it's 'blind' to

Concr modelling shows early evidence of being generalisable across cancer types

Background and aim

We have previously demonstrated accurate response predictions using the Genomics of Drug Sensitivity in Cancer1 (GDSC) comprehensive dataset. However most drugs do not have a database of 800+ cell lines with their associated IC50. Hence we set out to assess generalisability of our modelling across cancer types, and to drugs the models are ‘blind’ to.

Data & Modelling

Data input:

  • 810 cell lines (WXS and RNAseq)
  • 175 compounds (SMILES)
  • 118,595 dose response curves (IC50)
  • Precision Panc cell line data

Modelling:

Concr model was trained on GDSC data to predict IC50 values, and using the SMILES molecular description and the dose-response data to predict drug efficacy. The model was applied directly to the Precision Panc cell line dataset to test predictive accuracy.

For drug generalisability, the process of progressively performing cell line viability experiments was simulated and the process was repeated 30 times to provide aggregate statistics. In this study no structural information about the therapy was provided.

Results

Case studies booklet 2

The overall predictions of the GDSC model have r2 score=0.414. Error bars=uncertainty in the prediction; color of the points=cell line; shape=drug used.

Case studies booklet 2 2

After <20 iterations the model accurately chose cell lines with IC50 values less than the targeted cutoff (left panel). After 50 iterations majority of the most sensitive cell lines have been selected, and after 100 nearly all of them have (right panel).

References

1Iorio, F., et al. (2016). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3),740–754

Case study

Identifying and validating breast cancer biomarkers for cohort stratification

Key takeaways:

Superior predictive accuracy compared to other methods

7x less patient data required for model training compared to the next best approach

First of its kind: disease-free survival stratification

Authors

Matthew Griffiths, Uzma Asghar, Matthew Foster

Background and aim

Better patient stratification in early-phase trials through more effective biomarkers would not only improve efficacy and faster drug approval, but also reduce length and cost of trials, ultimately passing the savings on to healthcare providers.

In this study we used Concr advanced statistical modelling to predict cell profiles of responding breast cancer patients and their overall and disease-free survival, segmented into risk groups.

Data & Modelling

Data input - TCGA:

  • 1098 breast cancer patients
  • Therapy used
  • Outcome (OS, DFS)
  • Tumour data (WGS, RNASeq, Illumina 450k Methylation)

Modelling:

Concr hierarchical bayesian multi-omic model to identify risk profiles was created excluding 500 patients who received alkylating therapy.

The dataset was split into 5 random cohorts: 1 was used as validation, and 4 were used to train the OS and DFS models, repeated 5 times.

Results

Case studies booklet 2 3

Figure 1. Concr custom Bayesian cell admixture model identified recurring multi-omic cell profiles in the TCGA cohort excluding patients who received alkylating therapy. These cell profiles could then be used to infer the subpopulation breakdown of the patients tumours who had received alkylating agents.

Case studies booklet 2 4

Figure 2. Using k-fold validation, risk profiles were associated with the cell profiles identified by the model, segmenting patients by DFS and OS (who received alkylating therapy). Results shown are aggregated predictions of five independent validations. AUC accuracy = 0.88, superior to other methods1 and using 7x less patients than next best approach.

Case studies booklet 2 6

Figure 3. Our models identified specific recurring subtypes of tumour cells commonly seen across the cohort, stratified by the genetic, transcriptomic and methylation status into subtypes 1-11. A ‘risk’ score was calculated for each therapy, with significant variation observed across the treatments and subtypes (whiskers = upper / lower quartiles of the confidence interval).

References

1Dubourg-Felonneau, et al. (2018). A Framework for Implementing Machine Learning on Omics Data. 1–5.


TCGA - The Cancer Genome Atlas
OS - overall survival
DFS - disease-free survival

Case study

Concr modelling identifies and prioritises determinants of breast cancer survival and outcome

Key takeaways:

Concr technology can prioritise measurable features and integrate multiple data types, yielding highly accurate models

Concr models can be used to simulate different treatment scenarios and clinical trial designs to identify optimal treatment strategies

Concr models can identify patient responders at cohort- and individual-level, allowing biomarker identification for best drug response

Authors

Aidan Kubeyev, Matthew Griffiths, Uzma Asghar, Matthew Foster

Background and aim

Identifying patient-specific features predictive of response to cancer therapy is vital for effective clinical drug development: to de-risk clinical trials and save resources on generating data that does not yield meaningful biomarker information. Importantly, it would lead to improved patient survival and outcomes in the clinical setting. In this study, we set out to identify and prioritise patient data features that were most informative for predicting response in early-stage breast cancer patients, using Concr advanced statistical modelling. Consequently, we applied our composite model to predict survival across all subtypes of breast cancer and compare it to the actual patient outcomes to measure putative improvements.

Data & Modelling

Data input - TCGA:

  • 1096 early-stage breast cancer patients
  • Therapy used
  • Outcome (OS, DFS)
  • Tumour data (CNV, hormone status)

Modelling:

Screenshot 2023 02 23 at 1 45 47 pm

A Random Survival Forest approach was adopted and fitted within the Concr composite model framework. The learning process was carried out by creating numerous decision trees, with parameters from multiple data types. Using the Concr composite model framework, biomarker and data type selection were based on the model indicating correct responses in unseen data.

Results

Screenshot 2023 02 23 at 1 49 09 pm

Figure 1. Measurement of clinicopathological features carries most power in predicting outcome.

The relative power of measurable features to predict drug response was evaluated in the Concr model using permutation-based importance. It was determined by measuring the decrease in the model's score when the values of biomarkers were rearranged randomly. Clinical standard-of-care features (blue bars) were identified as the major differentiators of the outcome.

Screenshot 2023 02 23 at 2 04 21 pm

Figure 2. Combining clinical and molecular features enhance model predictive accuracy.

To test whether incorporating multi-source information (i.e. clinical, genomic, imaging data) to statistical learning would improve prediction accuracy of treatment response, the Concr model was trained using data from 743 breast cancer patients (across sub-types and treatments).

Results depict differential accuracy of the Concr risk model, trained using distinct data parcels (x-axis), as measured by a time-integrated Area Under the Curve (AUC) and c-index. The predictive accuracy improved stepwise by adding additional relevant data types, with the final AUC of 0.8 and c-index of 0.82.

Screenshot 2023 02 23 at 2 06 17 pm

Figure 3. Concr model predicts optimal treatments for improved overall survival.

Having developed an accurate predictive model of breast cancer, it was utilised to run numerous alternative treatment scenarios (simulating clinical trials), to retrospectively explore whether better treatments could have been administered.

The Kaplan-Meier analysis (top plot) demonstrated that a sub-cohort of patients could have received better treatment, based on increased overall survival. The difference between the treatment received and the Concr-predicted ‘improved treatment’ regimen is depicted in the bottom plot.

Screenshot 2023 02 23 at 2 09 23 pm

Figure 4: Concr modelling allows the identification of measurable criteria for biomarkers, capturing individual patients’ best treatment response.

After the Concr model identified a cohort of patients that could have benefitted from a different treatment as suggested by Concr modelling (coloured dots above the diagonal line), it enabled feature exploration of individual patients. Each dot represents a patient in the ‘Concr-improved treatment’ displaying ER, PR and HER2 hormone status, as an example of feature exploration.

References

TCGA - The Cancer Genome Atlas
OS - overall survival
DFS - disease-free survival
CNV - copy number variant

Read our latest news

News

Partnerships

Our partners leverage Concr advanced predictive modelling across every stage of therapeutic development to create shared value for the benefit of cancer patients.

Partnerships

FarrSight™️

Our cloud-based platform effectively integrates diverse data parcels to generate meaningful insights.

Platform