Can We Use AI To Make Decisions About Kidney Treatment? Estimate The Optimal Time For Renal Replacement Therapy (RRT) Using Clinical Data!
3 main points
✔️ Knowing in advance the optimal timing of renal replacement therapy (RRT), either hemodialysis or kidney transplantation, for chronic kidney diseases (CKD) is an important factor in improving patient health and disease status.
✔️ In this paper, we propose a prediction model to predict the initiation of RRT at 3, 6, and 12 months after the first diagnosis of CKD, using only the comorbidity data from the National Health Insurance in Taiwan.
✔️ Using data from 8,492 patients, we found an area under the receiver operating characteristic curve (AUC) of 0.773 to predict RRT within 12 months of CKD diagnosis.
Using machine learning models to predict the initiation of renal replacement therapy among chronic kidney disease patients
Written by Erik Dovgan , Anton Gradišek, Mitja Luštrek, Mohy Uddin, Aldilas Achmad Nursetyo, Sashi Kiran Annavarajula, Yu-Chuan Li, Shabbir Syed-Abdul
(Submitted on 5 June 2020)
Comments: Accepted toPLoS ONE.
Subjects: RRT, machine learning (ML)
Is it possible to predict the appropriate timing of kidney treatment?
This paper reports on a study focused on renal replacement treatment (RRT), using machine learning to build a model to estimate the appropriate timing. Decreased renal function has been identified as a major risk factor for cardiovascular disease, and once it worsens, it is difficult to improve, so predicting the appropriate timing of treatment is important for interventions, including prevention. On the other hand, there are few studies that have investigated the estimation of the optimal timing of RRT. In this study, we investigated a model for estimating the optimal timing of RRT based on a machine learning model using data that is readily available in clinical settings - diagnostic data of CKD and complications. To build the model, we investigated multiple feature extractions and combinations of feature selections and hypothesized that the prediction accuracy of RRT would be significantly improved in a population with complications of CKD, such as diabetes and hypertension. We are testing this hypothesis on sample data from patients with diabetes. This study clarifies the findings of the estimation of renal function replacement therapy and leaves useful insights for the research base and various approaches to predictive modeling in future medicine.
About the Kidney
First of all, I would like to talk briefly about the kidney, which is closely related to CKD. Since kidneys do not have pain sensation and many diseases are seen in elderly people, many people may not know what kind of organ it is. Therefore, we hope that you can get a rough and simple image here.
The kidneys are organs located near the center of the back that filter out unwanted substances from the blood and expel them from the body. They exist in two positions across the spine and separate the necessary and unnecessary substances in the body through the blood - the image of filtering the blood - to produce urine. This separation takes place in the glomerulus, a collection of capillaries, into which one-fourth of the blood pumped from the heart flows.
More specifically, there are two functions: the filtration function - the division of unwanted and necessary substances in the glomerulus - and the reabsorption function - the function of reabsorbing the necessary substances among those determined to be unwanted. In the former function, the glomerulus absorbs inflowing fluid. In the former, the incoming blood is roughly filtered so that mainly large molecules - proteins, red blood cells, etc. - remain. Afterward, the filtered blood is reabsorbed in two stages, with the finer molecules that the body needs - water, electrolytes, etc. - being reabsorbed in a place called the renal tubules. Therefore, even if we say that the kidneys are bad in a word, the cause differs depending on whether the filtration function or the reabsorption function is bad, so it is necessary to identify these and choose the treatment.
In addition, it is believed that once kidney function declines, it is difficult to improve - it is generally recognized that dialysis treatment is only a replacement for kidney function, not a cure. Since the kidneys are the only organs that produce urine from the body's filtration system, it is difficult to replace them with other organs, and a decline in kidney function can be a significant burden on physical activity - a factor that can lead to cardiovascular disease. Therefore, it is important to prevent and stop the deterioration of the kidney in order to prevent these diseases. Against this background, prevention of decline in kidney function - how early to predict the decline in kidney function and how to treat it - plays a major role in the quality of life afterward.
What is chronic kidney disease (CKD)?
Chronic kidney diseases (CKD) are diseases and conditions in which the aforementioned functions of the kidneys - filtration and reabsorption - are chronically impaired, causing the body's filtration function to fail and a state of dysfunction. It is sometimes referred to as a general term for diseases represented mainly by diabetes, hypertension, and chronic nephritis. In recent years, there has been an increase in the number of reports of chronic kidney disease caused by lifestyle-related diseases - diabetes and hypertension - which account for a large number of patients. The main characteristic of this disease is the decrease in renal function due to vascular damage caused by the increase in risk factors for atherosclerosis - blood glucose, LDL-C, etc. As mentioned above, the filtration function of the kidney is performed through the glomerulus - a collection of capillaries. Therefore, when the function of blood vessels is reduced or inhibited due to vascular disorders, the function of the kidneys will also be reduced as a result. As a result, unwanted substances in the body are not excreted, and unwanted substances accumulate in the body, starting from edema and stiff shoulders, unwanted substances accumulate in the whole body and cause sepsis and cardiovascular diseases - in fact, CKD is said to be one of the biggest risk factors for cardiovascular diseases, and the relationship between CKD In fact, CKD is said to be one of the biggest risk factors for cardiovascular disease, and there is even a term called cardio-renal linkage that describes this relationship.
In addition, it is difficult to be aware of the worsening of CKD symptoms, and in many cases, CKD is found in an advanced state when it is discovered. CKD is almost asymptomatic in its early stages, and the kidneys themselves have no sensation of pain, so detection is often delayed. In addition, as mentioned above, once the kidney function deteriorates, it is difficult to improve again, so detecting the disease at an early stage and starting treatment - how early to detect the disease and treat it at the right time - will affect the prognosis afterward.
Treatment for CKD
Renal replacement therapy (RRT), such as dialysis and kidney transplantation, is commonly used to treat advanced CKD, but the delay in initiating these treatments can reduce their effectiveness. These therapies are used to replace kidney function in advanced CKD conditions - end-stage renal diseases (ESRD). For example, dialysis is a treatment that filters blood outside the body and returns it to the body. As mentioned above, CKD has no subjective symptoms and is relatively difficult to diagnose by non-specialists, so patients are often referred to specialists when their condition has progressed. Studies have shown that this delay in referral to a specialist can lead to emergency dialysis - an urgent and life-threatening situation that occurs when a permanent treatment device (such as a peritoneal catheter) is not available - and can increase the severity of the condition. The need for dialysis has been identified in several studies. Therefore, the development of reliable indicators to predict the need for dialysis has been identified as important to predict RRT, as it will allow both physicians and patients to better prepare for these treatments.
Previous research on the prediction of RRT
Currently, there is no conclusive evidence on the optimal time to initiate RRT, and several studies have been reported to estimate the optimal timing. In particular, studies using machine learning to predict the initiation of RRT within a few months to a year and to predict acute renal failure requiring RRT within a few hours to a few days have been reported. On the other hand, these existing studies use laboratory and demographic data for analysis and prediction of RRT, data that are not always available and thus lack universality.
Purpose of this study
The aim of this study is to develop a history-based screening tool, validate a machine learning model to predict future RRT at the time of CKD diagnosis, and identify comorbidities - such as diabetes - in the model that predict the initiation of RRT at different time points. The proposal is to test a machine learning model to predict future RRT at the time of MLD diagnosis and to identify comorbidities - such as diabetes - in the model that predict the start of RRT at different time points. The proposal will investigate the performance of the comorbidities and the ML model in predicting the need for RRT in the next 3, 6, and 12 months - whether to start dialysis or to receive a transplant. The advantage of this approach is its universality, as it uses readily available data from hospital databases - diagnostic data on CKD and its complications - and does not rely on GFR or other laboratory data. These predictive models can be used by policymakers. These predictive models are of great benefit to policymakers, hospital administrators, and insurance companies, as they provide insight into trends in disease progression and allow for better allocation of resources - useful for both physicians and patients with CKD for better health planning and resource management. management.
The design of the study was retrospective - a so-called backward-looking cohort study - matching patients who underwent RRT - hemodialysis, peritoneal dialysis, or renal transplantation - at 3, 6, or 12 months after CKD diagnosis within 3, 6, and 12 months after CKD diagnosis (Figure 1). Patients will be followed and observed for 3, 6, and 12 months from the date of their first CKD diagnosis and labeled for RR (Figure 2).
In our proposal, we test four methods for feature extraction: 1. raw data, 2. percentage, 3. boolean, and 4. time: 1. raw data: for each diagnosis, we count the number of occurrences of CKD for each time the diagnosis is made; 2. percentage: for each diagnosis, we calculate the percentage of occurrences of CKD to the total number of patient visits; 3. boolean: for each diagnosis, we set 1 if CKD occurs at least once and 0 otherwise; 4. percentage: for each diagnosis, we calculate its occurrence rate against the total number of patient visits; 3. boolean: for each diagnosis, we set 1 if CKD occurs at least once and 0 if it does not; 4. time: we divide the observation period into sub-periods, and for each sub-period, we determine whether the diagnosis occurred or (1) and whether it did not occur (0). In addition, higher weights are assigned to these values for newer periods, and weights are assigned for all observed sub-periods. In this approach, the observation period is divided into 7 sub-periods of 6 months each and the remaining period from the time of CKD diagnosis. The weights for these periods are based on the interval indices i = 1, . , 7, with i = 1 being the most recent interval and i = 7 being the most distant interval. The rest have the same weight as the furthest interval, resulting in a sum of these weights of 1.
Feature selection and adjustment
The feature selection in the proposed method includes 1. correlation between features and RRTs - removing diagnoses lower than 0.1, 2. diagnoses related to CKD - i.e., comorbidities in CKD, and 3. selection based on mutual information content and classification performance, three approaches are investigated. Among these, 2. complications include acute glomerulonephritis, chronic glomerulonephritis, diabetes mellitus, essential hypertension, hyperlipidemia, polycystic kidney disease, and kidney stones. In addition, 3. based on the relationship between the amount of mutual information with the class and the performance of the random forest classifier, diagnoses are added to the input data sequentially until the upper limit of the performance improvement of the classifier is reached. We also use Principal Component Analysis (PCA) to aggregate 5624 features of diagnoses and dimensionally reduce them to 10 Components. In order to eliminate imbalances in the dataset - in particular, the imbalance of no RRTs accounting for 87% of the 12-month prediction period, 94% of the 6-month prediction period, and 96% of the 3-month prediction period - we assigned weights to each patient and inversely proportioned the weights to the class frequencies in the database. The effect of imbalance is mitigated by making the weights inversely proportional to the class frequencies in the database.
In this study, several machine learning-related methods - Decision Tree, Bagging Decision Trees, Random Forest, XGBoost, Support Vector Machines, Simple Gradient Descendent, Nearest Neighbors, Gaussian Naive Bayes, Logistic Regression, and Neural Network - and investigates the performance of each (Table 1). Here, Bagging Decision Trees is a method that replaces feature selection in Random Forest, a kind of Ensemble learning that extracts and integrates features of a decision tree, with the selection of a decision tree. tree.
In this paper, we divide the training and test sets by 10-fold cross-validation so that the ratio of RRTs to non-RRTs in each data set is the same. The performance measure is the Area Under the Radio Operator Characteristic Curve (AUC), and the average value of the AUC obtained at each stage is shown as the result. Since AUC is an index that takes into account the positive response rate, it is introduced as a measurement index for data sets with imbalance, such as this one, because it is assumed to be an index based on the characteristics of the data set rather than the positive response rate. In addition, we mention the sensitivity and specificity of the ML model.
Predicted results after 3, 6, and 12 months
The purpose of this evaluation is to clarify the estimation performance of each machine learning model for the RRT after 12 months.
The results of the data processing parameters of the algorithms sorted by AUC (Table 2) and AUC > 0.7 (Fig. 3) show that the highest AUC is obtained when none of the preprocessing - no feature selection, filtering, or dimensionality reduction - introduced in this study is performed. AUC was obtained. In addition, it was confirmed that the trade-off between sensitivity and specificity became clearer by improving the equilibrium of the data. The feature obtained by time was the best feature.
The results after 3 and 6 months (Fig. 4) also show that the trade-off between sensitivity and specificity for resolving homogeneity is improved and that the best performing ML models are Logistic Regression and SGD.
Comparison of models for all patients and diabetic patients
This evaluation was conducted to determine the impact of diabetic nephropathy, a major complication of CKD - the difference in predictive performance between CKD patients with and without diabetes. The authors hypothesized that patients with these complications would be better predictors, and to test this hypothesis, they evaluated patients with type 2 diabetes and patients without diabetes separately.
The results of the evaluation (Figure 5) showed that the highest AUC was obtained when training and testing with all patients, the second-highest when training with all patients and testing only with diabetic patients, and the lowest AUC was when training and testing with diabetic patients. The authors state that a significant factor is a difference in the number of training data.
Estimating the appropriate timing of renal replacement therapy (RRT) can help prevent renal function decline and, ultimately, cardiovascular disease, so high accuracy is required, but studies investigating the optimal timing of RRT initiation are However, few studies have investigated the optimal timing of RRT initiation. In this study, we investigated a model for estimating the optimal timing of RRT based on a machine learning model using data that can be easily obtained in clinical settings - diagnostic data of CKD and its complications. In the study, various combinations of multiple feature extraction and feature selection were exhaustively investigated. We also test the hypothesis that the prediction accuracy of RRT may be improved in patients with CKD complications, such as diabetes and hypertension, based on sample data from patients with diabetes.
As a result of the evaluation, the best performance was obtained with the Logistic model, which uses feature extraction with the time pattern defined by the authors: AUC=0.773. We also investigated the predictive performance using data extracted from all patients, including those with diabetes, and only those with a history of diabetes. The results showed that the performance on the dataset including all patients was higher than that on diabetes - these indicate a greater influence of the number of data than biological factors. As a result of the study, it can be inferred that the model is not applicable for implementation in clinical practice, as it cannot be determined to be highly accurate, but the results are expected to contribute to future research on predictive models in the medical field, for example, in determining treatment strategies.
An extension of this work is the application of the estimation model to different kidney-related diseases such as Acute Kidney Injury (AKI) and proteinuria. Among these diseases, the estimation of AKI is particularly meaningful as it often requires urgent treatment (a deep learning paper by Google has been published previously). In addition, proteinuria is strongly associated with other renal diseases, and some epidemiological studies have pointed out that it is one of the biggest predictors of CKD. Therefore, it would be clinically significant if such diseases could be estimated using only clinical data, as shown in this paper.
On the other hand, the validity of the method of setting the proportions of RRTs and non-RRTs introduced in this evaluation. As mentioned above, in the 10-fold cross validation, each dataset is adjusted so that these proportions are the same. On the other hand, as discussed in this paper, real-world datasets are often unbalanced - i.e., there is a possibility that these uniformly proportioned datasets are not valid. In particular, when clinical data are used, as in this case, the tendency for heterogeneity - the tendency for RRTs to be relatively small - is likely to be more pronounced, and it is, therefore, necessary to include the results of tests with various proportions in addition to the uniform case.
Categories related to this article