A Method For Predicting Cancer Drug Candidates Based On Hostile Domain Adaptation Is Proposed!

Neural Network 31/10/2024

3 main points
✔️ New model proposed to estimate drug candidates associated with suppression of cancer stem cell function
✔️ Uses adversarial domain adaptation techniques to learn while eliminating bias in two different data sets
✔️ Drug candidates predicted by the model are validated by experiments using real cells Confirmation

AI identifies potent inducers of breast cancer stem cell differentiation based on adversarial learning from gene expression data
written by View ORCID ProfileZhongxiao Li, Antonella Napolitano, Monica Fedele, Xin Gao, Francesco Napolitano
(Submitted on 22 August 2023)
Comments: Published on bioRxiv

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

Research Background and Key Points of the Method

Cancer, a disease that is expected to affect one out of every two Japanese people in the future, is one of the most intractable diseases for which no cure has been established in the modern era. Cancer is known to occur and progress when the mechanism for cell growth in animals, including humans, breaks down and the cells become specialized cellscalled cancer cells.

One of the most specialized types of cancer cells are cancer stem cells. Stem cells are specialized cells that have the ability to replicate themselves or change into cells with a wide variety of functions (this is called differentiation). Cancer stem cells combine the properties of a cancer cell, in which cell regulation is broken down, with the properties of a stem cell.

These cancer stem cells are known to have a significant impact on cancer metastasis andrecurrence. In particular, in recent years, drugs have been developed to induce cancer stem cells to change the way they differentiate and suppress cancer progression as a treatment for cancer.

This paper describes the development of a new model to identify new drug candidates using a technique called adversarial domain adaptation, with a particular focus on breast cancer among other cancers.

Model Structure

For hostile domain adaptation used in this model

Domain adaptation is an approach widely known in computer vision for applying a model trained in one domain (e.g., photography) to another domain (e.g., painting).

In this paper, we utilize a further development of this technique, namely adversarial domain adaptation (see below for details). This technique has been shown to remove biases specific to different datasets and to allow training of models that leverage large amounts of information across different platforms.

Study Workflow

Tasks in this model consist of two types of tasks : main tasks (represented as Main Task in the figure) andadversarial tasks (represented as Adversarial Task in the figure).

The former task (Main Task)uses the source domain to learn which of the four differentiation stages a cell is in. In this task, the model is trained so that the accuracy of the four-valued classification that predicts the stage of differentiation is as large as possible.

In the latter task (Adversarial Task), we learn to discriminate between the source and target domains. That is, it learns to make the accuracy of discrimination between those datasets as small as possible. By introducing such a learning mechanism, it is possible to remove the bias of each dataset.

The trained model is then used to score the source and target domains based on how well the drug is able to induce differentiation, as shown in Figure 2.

Then, as shown in Figure 3, the scores just obtained are used to determine suitable candidates for agents that promote cell differentiation and maintain stem cell characteristics.

Finally, as shown in Figure 4, six of the prioritized agents are selected and tested on real cells.

Thus, by using deep learning to narrow down drug candidates before conducting experiments, it is believed that the cost in time andmoney in estimating drug candidates can be significantly reduced.

Overall Model

This figure gives an overview of how the model is actually trained. As shown in the figure, the model is first trained with a dataset that is the source domain and a dataset that is the target domain, each using a different encoder.

Next, the respective features obtained by the two types of encoders are integrated and used as input to train the decoder.

The output of the decoder is used as input to the subsequent task classifier (circled in green) and the adversarial domain classifier (circled in red).

In the task classifier, the MLP layer is used to predict which of the four differentiation stages will be applicable. During training, a loss function (Task Classification Loss) is used toimprove prediction accuracy and a loss function (Domain Confusion Loss)to reduce the bias inherent in the domain.

Adversarial domain classifiers, on the other hand, learn to discriminate between source and target domains using a loss function (Adv. Domain Loss) thatlearns to minimize the accuracy of the discrimination.

About the Data Set

The source domain dataset is a dataset that systematically compiles information on human induced pluripotent stem cells (stem cells derived from adult somatic cells, abbreviated as hiPSCs ), which are analyzed using scRNA-seq, a technique to determine the function of genes derived from these cells.

The target domain dataset (LINCS L1000), on the other hand, is a large scale analysis and systematic compilation of how cells respond to all stimulibased on how genes affect their function.

The latter is a dataset that specifically details how a particular drug affectsfunction, e.g., how giving drug X (or Y) to cell A (or B) will ultimately affect its function, as shown in the figure. The data set shows in detail how the final function of cells A (or B) is affected by drug X (or Y).

In this study, domain adaptation is used to first learn the pattern of cell differentiation from the source domain and then use that knowledge to predict the ability of each drug to induce differentiation.

Diagram of Learning Progression

This figure shows how learning progress es, with the horizontal axis representing epochs in learning and the vertical axis representing losses. Changes in loss for a task classifier in the source domain are shown in blue, changes in accuracy for that classifier are shown in light gray, changes inaccuracy for the source domain in an adversarial classifier are shown in gray, and changes in accuracy for the target domain are shown in dark gray.

The accuracy of the task classifier (light gray) reaches 86.7% accuracy. As for the accuracy of the adversarial classifier, since the goal is to minimize as much as possible the ability to distinguish between the two types of data sets, the accuracy using the source domain (gray) and the accuracy using the target domain (dark gray) are expected to approach 50% each, and the figure certainly converges to that value The figure shows that they do indeed converge to that value.

Comparison before and after Domain Adaptation

In Figure c above, we show the results of the tSNE analysis before (left panel) and after (right panel) domain adaptation. Blue, green, yellow, and red show the distribution obtained by clustering thesource domaininto four, while black shows the distribution obtained from the data set used in the target domain.

While cells at different stages of differentiation are widely dispersed before domain adaptation, after domain adaptation they are more distinguishable, and furthermore, the distribution of target domains is spread throughout the distribution of source domains.

Experimental Results

This figureshows the results of chemical experiments performed on the drug candidates predicted by the model.

First, Figure a shows the DECODE scores in red for the top 10 drug candidates predicted by this model, indicating their characteristics as stem cells, and the scores in blue for the bottom 10 drug candidates.

The higher the DECODE score, the stronger the stem cell characteristics, and this figure predicts that higher priority drug candidates will have higher stem cell characteristics.

In Figure b, we show the ratio of values before and after drug treatment for three indicators: the number of stem cell populations, the total area formed by the populations, and their average. This shows that the use of the drug candidates predicted by the model makes a difference in various indicators related to the properties of the stem cells.

In addition, in the paper, the high priority drug candidates can be evaluated for their effects on breast cancer stem cells, showing that they inhibit the growth and self-renewal capacity of these cells. In other words, it is confirmed that each drug is highly effective.

Figures c through e are graphs showing the percentage of the 30 drug candidates predicted by the model to inhibit or promote processes related to cell function and cell structure. Inhibition of function is shown in red and promotion of function is shown in green.

These figures show that the drug candidates predicted by the model suppress the function of genes related to the cell cycle, but promote cell differentiation. In other words, the drug candidates are able to regulate the properties of cancer stem cells.

Figures a through c above illustrate how the model's predicted agents affect the ability of breast cancer stem cells to grow and replicate themselves.

For Figure a, the effect of the drug is visually represented.The top and bottom rows show representative examples of cancer stem cells used in the experiment. The leftmost of the five columns showshow much the stem cell characteristics are affected bygradually increasing concentrations of the drug"TRIPROLIDE" in the second and third columns from the left and "OTS-167" in the fourth and fifth columns,respectively,obtained by the model predictionsThe following table shows the results of the model.

The more we increase the concentration of the drug, the more we read that the percentage of clumps is decreasing (i.e., that the drug is suppressing the function of cancer stem cells or, in other words, that the drug is effective as a treatment ).

Figure b is a further quantification of this, showing how the properties of one cancer stem cell on the left half and another stem cell on the right half differ when three different drugs are used (you can safely assume that the horizontal axis in each figure represents the concentration of the drug and the vertical axis the strength of the cancer stem cell's properties ).(You can safely assume that the horizontal axis of each figure represents the drug concentration and the vertical axis represents the strength of the cancer stem cell characteristics .)

The case where no drug was added is shown inblue, the case where a small amount was added is shown in red, and the case where a large amount was added is shown in green. The figure confirms that the green color tends to score lower on the vertical axis than blue or red (i.e., that the cancer stem cell function can be inhibited, or in other words, that the drug predicted by the machine learning model is properly effective as a treatment ).

Summary

In this study, we proposed a model using a machine learning approach of adversarial domain adaptation to identify drug candidates associated with the inhibition of cancer stem cell function. Specifically, we introduced ahostile domain classifier that distinguishes two different datasets into a source domain and atarget domain, and a loss function that reduces the bias between the two datasets for training.

The drug candidates selected based on the model's prediction scores showed effects on cellular functions that adversely affect cancerthrough experimentation, confirming the validity of the model.

To further this research, the authors aim to evaluate the therapeutic efficacy and safety of drugsthrough clinical trials and to elucidate molecular mechanismsfrom a chemical perspective.

Not only in this paper, but also in many other papers, machine learning is widely used to narrow down the number of drug candidates and to speed up drug discovery by conducting costly experimental validation of the candidates. If you are interested in this approach, we encourage you to read more related papers.