A Proposed Model For Identifying Asymptomatic COVID-19 Infected Individuals Using Reinforcement Learning!

Reinforcement Learning 16/03/2022

3 main points
✔️ Asymptomatic infected patients cannot be prevented, which poses a challenge.
✔️ We report on the design and performance of a reinforcement learning system - Eva.
✔️ Identifying asymptomatic infected patients in real-time is expected to be effective in policy-making decisions.

Efficient and targeted COVID-19 border testing via reinforcement learning
written by Hamsa Bastani, Kimon Drakopoulos, Vishal Gupta, Ioannis Vlachogiannis, Christos Hadjicristodoulou, Pagona Lagiou, Gkikas Magiorkinis, Dimitrios Paraskevis, Sotirios Tsiodras
(Submitted on 22 Sep 2021)
Comments: Nature.

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

background

Can we prevent asymptomatic infections of COVID-19?

In this study, we aim to develop a system that can identify COVID-19 asymptomatic infected individuals with higher accuracy using reinforcement learning (RL).

To control the recent outbreak of the SARS-CoV-2 pandemic, many countries imposed restrictions on non-essential travel. Subsequently, travel restrictions have been lifted using a combination of four strategies: white list, unrestricted travel permits; gray list, travelers must provide proof of negative PCR and reverse transcription tests before arrival; red list, travelers must be quarantined upon arrival; and black list, travelers must be banned from non-essential travel. travel ban. In this context, the decision on which list to assign varies from country to country and is often based on published population-level epidemiological indicators - number of cases per population, number of deaths per population, and positivity rates. However, these indicators have been noted to be incomplete, with challenges such as underreporting, bias in symptomatic populations, and delayed reporting.

To address these challenges, we develop a reinforcement learning system - Eva - based on the information in the passenger handbook to propose an optimal prevention system for asymptomatic patients. Eva uses real-time estimates of COVID-19 prevalence to derive optimal border policies by estimating asymptomatic infected patients with high accuracy. Unlike conventional restriction protocols, assignments can be made based on only limited information, based on the demographics of incoming travelers and historical test results. The system aims to identify asymptomatic infected travelers and provide real-time information to policymakers for downstream decision-making.

What is reinforcement learning?

First, we outline reinforcement learning, which is used in the proposed method in this study.

Reinforcement learning (RL) is a type of machine learning that uses two factors, the agent and the environment: the agent performs an action, and the environment provides feedback on that action -The main feature of RL is that it can be applied to a data set in a way that allows the agent to learn about the environment and its rewards, and to derive a course of action - a strategy - to maximize the rewards. The main feature of RL is its low dependence on datasets: unlike unsupervised and supervised learning, RL does not require a static dataset, since it learns from the experience collected by the agent based on feedback from the environment - that is, it does not require any data collection, preprocessing, or labeling before learning. collection, pre-processing, and labeling are not required.

The workflow of RL is generally as follows.

(1) Creating the environment: Define the environment in which the agent operates - the interface between the agent and the environment, etc. In terms of safety and experimental feasibility, simulation is often introduced.

(2) Definition of reward: Define the reward concerning the goal, and determine how to calculate the reward. The reward is a guideline for the agent's action selection.

(3) Agent creation: we define an agent, which consists of measures and reinforcement learning algorithms. Specifically, a. Choosing a way to represent the measures: neural networks, lookup tables, etc. b. Choosing an appropriate learning algorithm: neural networks are generally used because they are more suitable for learning in large state and action spaces.

(4) Agent training and validation: set the conditions for learning - e.g., stopping conditions - and train the agent. After training, we validate the measures derived by the agent: revisit the design of reward signals and measures, and rerun the training if necessary. RL is not sampled efficient - especially for model-free and on-policy algorithms - and in some cases requires several minutes to several days to train; therefore, training is often parallelized on multiple CPUs, GPUs, or clusters.

(5) Deployment of measures: Investigate the learned measures. Based on the results, we may return to the initial stage of the workflow. Specifically, if the learning process and the derivation of measures do not converge within the computation time, the following items need to be updated before retraining: learning settings; configuration of the reinforcement learning algorithm; measure representation; definition of reward signals; action and observation signals; and dynamics of the environment.

What is SARS-CoV-2?

In this chapter, we outline the analysis target, SARS-CoV-2.

SARS-CoV-2 is the name of the pathogen that causes COVID-19, which was discovered in 2019 in Wuhan, China, and subsequently spread worldwide, causing a pandemic. Symptoms appear about four to five days after a viral infection - as long as two weeks - while asymptomatic cases have also been reported. The main symptoms are fever; cough; breathlessness; tiredness; chills; muscle aches; headache; sore throat; loss of smell and taste. Older people and those with underlying medical conditions such as heart disease and diabetes are more likely to develop severe pneumonia, and other generations have reported respiratory symptoms, high fever, diarrhea, and loss of taste. Childhood infections are mild or subclinical, while viral infections themselves occur, and transmission to the elderly by asymptomatic infection has been reported. By September 2021, the number of confirmed cases of infection worldwide was 220 million, and the number of deaths was 4.55 million. By September 2021, there were 220 million confirmed cases and 4.55 million deaths worldwide. It is spread from person to person through coughs and droplets, mainly by airborne transmission. Currently, highly effective vaccines are being developed to prevent infection.

purpose of one's research

We aim to develop a system to identify COVID-19 asymptomatic infected individuals with higher accuracy using reinforcement learning (RL). The currently implemented guidelines for deregulation are generally based on epidemiological indicators - number of patients per population, number of deaths per population, and positivity rate - but they have challenges such as underreporting, the bias of symptomatic population, and delay in reporting, and are pointed out to be incomplete indicators. It has been pointed out that it is an incomplete indicator. In this study, we aim to estimate the number of asymptomatic infected people by using reinforcement learning and customer information without personal information. Specifically, we will estimate the prevalence of COVID-19 in real-time based on passenger information, and set a policy for lifting restrictions to derive a guideline for estimating asymptomatic infected people with higher accuracy. Such a system is shown to have a higher estimation accuracy than internationally proposed border control policies based on epidemiological indicators.

technique

In this chapter, we describe the proposed method - Eva. This method derives an optimal prevention scheme for asymptomatic patients with COVID-19. The proposal was also deployed at all 40 entry points in Greece, including airports, land routes, and seaports, from August 6 to November 1, 2020 (see figure below). The method is based on the analysis of a passenger locator form (PLF), one per household, which includes information on the country of departure, demographics, place, and date of entry, at least 24 hours before arrival. The analysis is based on a passenger locator form (PLF), one per household, containing information on the country of departure, demographics

Estimating prevalence by traveler type

In this chapter, we describe the methodology for estimating prevalence in Eva.

Eva estimates the COVID-19 prevalence based on the test results of travelers who used it in the past. The prevalence estimation consists of two steps.

(1) LASSO regression from high-dimensional statistics is used to adaptively extract a minimum set of traveler types based on demographic characteristics - country, region, age, and gender. These are updated every week based on test results.

(2) An empirical Bayesian method - deriving prior probabilities from previous experience - is used to estimate the prevalence of each type. The setting in which the proposed method is implemented has a low prevalence of COVID-19 - 2 in 1,000 - and arrival rates vary widely across countries; therefore, the test data are unbalanced - few Cases in the test population -and sparse - with few arrivals from certain countries. For these data characteristics, we use an empirical Bayesian method to process the data sequentially and make appropriate decisions.

Inspection Assignment

In this chapter, we describe a prevalence-based approach to test assignment.

Using the prevalence estimates described above, Eva derives a subset of travelers who should be tested for PCR at arrival, based solely on their traveler type. This assignment of tests is done in a way that reconciles the trade-off between two objectives: the exploration-exploitation trade-off.

(1) Maximize the number of infected asymptomatic travelers based on current information (Exploitation)

(2) Assign tests to travelers for whom there are no accurate estimates, based on experience, to accurately assess and update their prevalence status (Exploration).

For this trade-off, Greedy allocation - concentrating testing on high prevalence types - will result in data not being extracted for the most prevalent, moderately prevalent types. Since the prevalence of COVID-19 can spike in some cases, we need to capture as much of the moderate symptoms as possible for proper learning - a challenge that we view as a multi-armed bandit problem in RL - especially one with non-stationary, contextual, delayed feedback, and constraints. viewed as batch bandit problems - and the need to consider information from pipeline tests - tests that do not return results. To solve this exploration-exploitation trade-off, we build an algorithm based on the Gittins index: each type introduces a deterministic index that represents a risk score, incorporating both estimated prevalence and uncertainty, and makes an assignment according to this value. assignment.

Gray list recommendations

In this section, we describe how we view countries at high risk of infection - the grey list.

Guidance derived from the prevalence estimates of the proposed approach - Eva - recommends that high-risk countries be graylisted; while mandatory PCR testing would reduce the prevalence of entry, the cost of testing would significantly reduce unnecessary travel; therefore, Eva recommends a policy of graylisting countries only when necessary to reduce the burden on contact tracing teams while maintaining low levels of asymptomatic cases. Therefore, Eva recommends a policy of graylisting countries only when necessary to reduce the burden on contact tracing teams while maintaining low levels of asymptomatic cases. Traditionally, such greylisting has required human input - in theory, it is possible to determine the graylist cutoff, but it is difficult to respond to requests from decision-makers; therefore, we have adopted a form of graylisting that allows for some flexibility and human input. Therefore, the gray list is designed to have a certain flexibility and to accommodate human input.

End of loop

In this chapter, we describe the completion of the update of the proposed method.

Test results were recorded within 24-48 hours and used to update prevalence estimates from previous steps. During peak months - August and September - 41,830 (±12,784) PLFs were processed per day and 16.7% (±4.8%) of arriving households were tested per day.

result

In this section, we describe the performance evaluation we performed in this study: specifically, comparison with random surveillance of asymptomatic infected patients; performance evaluation of reinforcement learning; and examination of epidemiological indicators.

Evaluation of gray list registration

Comparison with random surveillance of asymptomatic infected patients

In this chapter, we compare the proposed method - Eva - with random surveillance - a general guideline - for asymptomatic infected patients. Random surveillance is used as a comparison because it does not require any information infrastructure and is frequently used. Here, we evaluated the performance of estimating prevalence based on inverse propensity weighting (IPW, the probability that an uninfected person is not infected) (see figure below). At the peak of the tourist season, random surveillance testing identified 54.1% of the infected travelers identified by Eva - i.e., random surveillance would require more than 85% of tests at each entry point to achieve the same effect as Eva. In contrast, in October, when arrival rates were lower, the relative performance of random surveillance improved to 73.4%.

These differences in performance can be explained by changes in the relative scarcity of testing resources (see figure below). As the number of arrivals decreases, the proportion of arrivals tested increases, and the need to test decreases - suggesting that Eva derives effective guidance when not enough testing is being done.

Performance Evaluation of Reinforcement Learning

Next, we evaluate the performance of reinforcement learning introduced in this study.

Here, we compare the performance of Eva with a policy based on population-level epidemiological indicators that utilize PLFs (see above); Eva utilizes IPWs to test passengers with a probability proportional to the number of cases per population, the number of deaths per population, and the positivity rate of the passenger's country of origin while accounting for the cost of airport testing and arrival constraints. We considered three policies for testing (see figure below).

During the peak tourist season - August and September - Eva identified the following infected persons based on IPW: case base: 69.0% (±9.4%); death base: 72.7%; positivity base: 79.7 %-Eva had identified more infected patients. When the arrival rate decreased - in October - it improved as follows: case base: 91.5% (±11.7%); death base: 88.8% (±10.5%); positive rate base: , 87.1% (±10.4%). The results indicate that the performance of Eva improved as testing resources were scarce; indeed, the relative improvement in Eva was greatest in the second half of the peak season - when infection rates were high and testing resources were scarce.

Examination of epidemiological indicators

This chapter examines epidemiological indicators in the policy.

As mentioned above, while existing policies - based on epidemiological indicators - are less precise in their estimates of prevalence, these indicators could be improved. In this regard, a comparison with Eva is made for the inferred epidemiological indicators: specifically, a country is classified as high risk - prevalence >0.5% - or low risk - prevalence <0.5%. -We assessed the extent to which these epidemiological indicators could be used to classify countries into two categories: this classification is synonymous with the selection of countries to be graylisted or blacklisted. We computed labels for each time point, and then investigated the predictive accuracy of the 14-day time series for the number of patients, deaths, test rates, and positive test rates per capita, trained using a gradient boosted machine on different subsets of covariates (see below).

Here, the model that does not use data has an area under the receiver operating characteristic curve - AUC - of 0.5; and this suggests that models 1-4 do not capture information on prevalence in asymptomatic travelers, because the AUC is close to 0.5. implications. In addition, model 5 with country-level fixed effects showed improved estimation accuracy - these fixed effects are country-specific idiosyncrasies not observed in the epidemiological data - testing strategies, social distancing, and other non-pharmacological interventions -are modeled. Therefore, the results suggest that unobserved factors may be important for the classification of high and low risk.

Evaluation of gray list registration

In this chapter, we describe the extent to which infection can be prevented by registration on the gray list.

The proposed Eva model to measure COVID-19 prevalence was used to detect high-risk areas and to adjust travel protocols by graylisting affected countries. 6.7% (±1.2%) of infected individuals were prevented from entering the country by Eva graylisting, the authors report. The Eva gray list prevented 6.7% (±1.2%) of infected individuals from entering the country, the authors reported.

consideration

In this study, we aim to develop a system that can identify asymptomatic infected patients in COVID-19 with higher accuracy by utilizing reinforcement learning (RL).

To control the SARS-CoV-2 pandemic, many countries have restricted non-essential travel. However, it has been pointed out that the indicators in the introduced entry restriction protocols are incomplete, with issues such as under-reporting, bias toward symptomatic populations, and delayed reporting. To address these issues, we have developed a traveler-informed, reinforcement learning-based system, Eva, to derive optimal prevention protocols for asymptomatic patients. Eva will use reinforcement learning to limit the influx of asymptomatic SARS-CoV-2 infected patients and estimate COVID-19 prevalence in real-time. Evaluation results show that the system achieves higher prediction accuracy than existing methods, including improved surveillance currently in place and epidemiological indicators utilized in the protocol. Such a system is expected to be effective in identifying asymptomatic infected individuals and providing real-time information for decision-making in policymaking.

One of the challenges in this research is the large learning cost. In general, the learning cost of RL is much larger than that of other methods, in terms of the amount of data and the learning time, and especially in the case of national-level data analysis such as this study, it is necessary to collect and process a large amount of data; therefore, if the model is redesigned every time it is modified, the operational cost is likely to increase. In Eva, type extraction, estimation, and test assignment are designed in a modular manner, and each type can be recombined to achieve higher performance.