Catch up on the latest AI articles

Cancer Status Can Be Predicted Before Surgery! A Proposed Framework For Preoperative Prediction Of Ovarian Cancer Using Machine Learning

Cancer Status Can Be Predicted Before Surgery! A Proposed Framework For Preoperative Prediction Of Ovarian Cancer Using Machine Learning


3 main points
✔️ In this study, we developed a highly accurate prediction model of cancer characteristics from preoperative blood test data using supervised machine learning.
✔️ Based on the knowledge obtained from the predictive model, we extract characteristics of cases that are associated with prognosis and investigate classification patterns of advanced-stage cancer.
✔️ Develop highly accurate predictive models A new prognosis-related disease classification was found, which was not obtained from previous clinical knowledge.

Application of Artificial Intelligence for Preoperative Diagnostic and Prognostic Prediction in Epithelial Ovarian Cancer Based on Blood Biomarkers
Written by Eiryo Kawakami, Junya Tabata, Nozomu Yanaihara, Tetsuo Ishikawa, Keita Koseki, Yasushi Iida, Misato Saito, Hiromi Komazaki, Jason S. Shapiro, Chihiro Goto, Yuka Akiyama, Ryosuke Saito, Motoaki Saito, Hirokuni Takano, Kyosuke Yamada and Aikou Okamoto
( Submitted on 15 May 2019)
Clinical Cancer Research 25(10)

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)


Is it possible to understand the pathology and prognosis of cancer before surgery by machine learning?

The purpose of this study was to develop an algorithm to predict the characteristics of ovarian cancer using only preoperative information, such as blood test data. Ovarian cancer is one of the most common reproductive tumors in women, and its prognosis tends to be poor. Therefore, it is necessary to confirm the pathophysiology of ovarian cancer in order to determine the treatment strategy. It is necessary to understand the pathogenesis of ovarian cancer - characteristics such as benign or malignant, stage of progression, and prognosis. In previous studies, prediction of the stage and histological type of ovarian cancer has been performed mainly by statistical methods, but there are currently no promising biomarkers. In this study, based on multiple biomarkers and clinical variables, we developed a model specifically for predicting the pathogenesis of ovarian cancer using a supervised machine learning algorithm and developed a model to estimate the clinical stage, histological type, surgical results, and prognosis of EOC patients before treatment. In addition, we have reported that we have obtained a new disease classification related to ovarian cancer prognosis based on preoperative examination data, which was not obtained by conventional clinical knowledge, from features extracted from plots using an unsupervised clustering method in the classification of cases at advanced stages of cancer. These findings are expected to be applied and developed for preoperative prediction and personalized medicine in other diseases.

What is epithelial ovarian cancer (EOC)?

First of all, let's briefly discuss ovarian cancer.

Ovarian cancer is one of the most unfavorable prognostic cancers among female genital tumors, and the number of deaths from this disease has been increasing in recent years, which has attracted much attention. Based on histological findings, this disease can be classified into at least five types (highly heterogeneous serous carcinoma, hypoheterogeneous serous carcinoma, endometrioid carcinoma, mucinous carcinoma, and clear cell adenocarcinoma), and according to the presence or absence of metastases, it can be classified into early-stage carcinoma (stage I and II) and advanced stage carcinoma (stage III and IV) (see World Federation of Gynecology and Obstetrics (FIGO)). World Federation of Gynecology and Obstetrics (FIGO)). While surgical resection of the tumor is considered to be the first choice of treatment, the response to chemotherapy is relatively good compared to other cancer diseases, so chemotherapy before and after surgery is a common treatment strategy. In addition to the fact that the response to chemotherapy varies greatly depending on the stage and histological type of the tumor, the availability of effective anticancer agents, such as PARP inhibitors and antibody drugs, which have recently been introduced, is also being investigated. Against this backdrop, preoperative prediction of the stage and histological type of cancer will enable us to select an appropriate treatment strategy for each patient.

Prior Research Issues and Research Objectives

Previous studies on ovarian cancer have reported the relationship between prognosis and the stage of progression and histological type, mainly using statistical methods. On the other hand, it is difficult to predict the characteristics of ovarian cancer such as benign/malignant, stage of progression, and prognosis, and to determine the treatment strategy using only preoperative information, because highly invasive procedures such as surgery and biopsy are required to ascertain these factors in actual clinical practice. Statistical prediction models based on biomarkers and multiple clinical factors have also been proposed, but due to problems such as collinearity between variables, it is considered difficult to process large-scale data with multiple input variables and extract appropriate features with these methods.

Against this background, in this study, we developed a model to estimate the clinical stage, histology, surgical outcome, and prognosis of patients with EOC before treatment using a machine learning algorithm based on multiple biomarkers and clinical variables, with a specific focus on prediction for ovarian cancer.


data set

For this validation, we used a backward-looking cohort dataset of 334 patients with epithelial ovarian cancer (EOC) malignant ovarian tumors and 101 patients with benign ovarian tumors, collected between 2010 and 2017, for analysis. Tumors were classified according to the FIGO classification (2014), and clinicopathological parameters such as age at diagnosis, clinical stage, residual tumor size after primary surgery, and 32 preoperative peripheral blood biomarkers were used. In addition, for all variables, random sampling was repeated to divide into training and test cohorts until there were no significant differences (P-value ≥ 0.20). As a result, 168 patients with EOC and 51 patients with benign ovarian tumors were assigned to training, and 166 patients with EOC and 50 patients with benign ovarian tumors were assigned to test.

Learning models and evaluation methods

The models considered in this study include Gradient Boosting Machine (GBM), Support Vector Machine, Random Forest (RF), Conditional RF (CRF), Naïve Bayes, Neural Network, and Elastic Net are used as seven supervised machine learning classifiers. The classifiers are trained using 10 fold cross-validation, and their predictive performance on classification is evaluated on a test dataset.


Performance assessment of EOC and benign tumors based on multiple preoperative blood markers

This assessment compares multiple logistic regression analysis based on 32 peripheral blood markers with single logistic regression analysis using each marker to examine predictors of ovarian tumor characteristics (see figure below).

Using the aforementioned evaluation methods, the prediction performance was 86.7% (percent correct) and 0.897 (AUC), confirming its superiority over the single regression linear model. We also report higher performance than models based on traditional regression methods when predicting EOC using supervised machine learning on the same test data including 32 peripheral blood markers. Specifically, the best prediction performance for EOC was observed for ensemble methods that combine decision trees, such as GBM, RF, and CRF: the classification accuracy of RF for benign ovarian tumors and EOC was 92.4% (percent correct) and 0.968 (AUC).

(Figure 1)

Unsupervised clustering analysis using a prognosis-related machine learning approach.

The purpose of this analysis is to clarify the characteristics of these cases based on the results of predicting the advanced stage of ovarian cancer.

The performance of predicting the stage of cancer progression in the aforementioned validation results confirmed that the accuracy was lower (AUC = 0.760) than that of differentiating between benign and malignant diseases. In response to this result, we hypothesize that there may be cases in which preoperative blood test patterns are similar between early-stage ovarian cancer and advanced ovarian cancer, and perform unsupervised machine learning validation using the unsupervised random forest method to calculate the similarity of samples.

To validate the results, we generated two-dimensional distribution plots using multi-dimensional scaling (MDS), which is a method that applies 32 items of age at diagnosis and preoperative blood test data, and places cases with similar patterns of preoperative blood tests close together and cases with different patterns far apart. plots were generated. As a result, the MDS plot with RF dissimilarity as input clearly separated patients with benign tumors from those with late-stage EOC, while early-stage cancers were divided into cases with preoperative blood test patterns similar to those of benign tumors (cluster 1) and cases with preoperative blood test patterns similar to those of advanced cancers (cluster 2). The results of the study have been reported. Cluster 1 had almost no recurrence, while Cluster 2 had higher recurrence and mortality rates, indicating a strong association with prognosis. In addition, several blood markers showed significantly different results between the two clusters of early EOC. It is reported that this cluster of early-stage ovarian cancer is different from the already known advanced stages (stages I and II), which is a new finding obtained in this validation.


The purpose of this study was to develop a model for predicting the status of ovarian cancer (benign, malignant, etc.) based on preoperative information alone using machine learning algorithms. In order to construct the model, we used 32 items of age and preoperative blood test data of ovarian tumor patients (334 patients with malignant ovarian tumors and 101 patients with benign ovarian tumors) and confirmed that the model predicted malignant and benign tumors with high accuracy using supervised machine learning algorithms such as random forest and SVM. malignant and benign tumors. In addition, the unsupervised clustering method using MDS plots for advanced-stage ovarian cancer revealed the existence of clusters similar to benign tumors and clusters 2 similar to advanced cancer. The authors report that they have discovered a new classification by investigating the patient's general condition, namely preoperative blood test data, which was not available in the existing knowledge. By advancing this research, it is expected that the status of ovarian cancer can be grasped with high accuracy before surgery without highly invasive procedures, which will have a significant impact on the decision of treatment policy regarding prognosis.

On the other hand, there are two issues: the overall sample size is small (approximately 400 patients), and the risk factors are unclear because of the cross-sectional association. In order to resolve these issues, it is desirable to conduct a prospective cohort study with a larger number of patients and conduct analyses including time series analysis to clarify the causal relationship.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us