AI Will Solve The Electricity Supply-demand Conundrum In The Era Of Mass EV Proliferation!

Neural Network 11/10/2024

3 main points
✔️ The EV network charging control problem was modeled as a Dec-POMDP and DDPG-based centralized and distributed multi-agent reinforcement learning was applied.
✔️ Theoretical analysis reveals that the centralized method has a larger variance of policy gradients but can mitigate non-stationarity through cooperative learning.
✔️ Simulation evaluation demonstrated that the centralized method is superior in charging cost, charging pattern smoothness, and fairness, and can be applied to large networks.

Centralized vs. Decentralized Multi-Agent Reinforcement Learning for Enhanced Control of Electric Vehicle Charging Networks
written by Amin Shojaeighadikolaei, Zsolt Talata, Morteza Hashemi
(Submitted on 18 Apr 2024)
Comments: 12 pages, 9 figures
Subjects: Artificial Intelligence (cs.AI)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

As electric vehicles (EVs) become increasingly popular, there is a risk that the demand for electricity during peak hours will increase significantly. Therefore, it is an important issue to properly control EV charging and minimize electricity use during peak hours. Traditionally, model-based methods and single-agent reinforcement learning methods have been used for EV charging control, but they face challenges in terms of dealing with uncertainty, privacy, and scalability.

Therefore, this paper proposes a distributed cooperative EV charging control method based on multi-agent reinforcement learning (MARL).Theoretical analysis and numerical simulation performance evaluation of theproposedmethod show the superiority of the centralized method and that it works effectively in realistic situations with a large number of EV users.

Related Research

We broadly categorize previous research on EV charging control into model-based approaches and model-free reinforcement learning approaches.

Model-Based Approach

- Methods such as binary optimization, mixed integer linear programming, robust optimization, stochastic optimization, model predictive control, and dynamic programming have been proposed.
- These methods require accurate system models.

Reinforcement Learning Approach

- Single agent reinforcement learning methods:Deep Q-learning, Bayesian Neural Networks, Advantage Actor-Critic, DDPG, etc. are applied.Assumes full observability, but is impractical from a privacy perspective.
- Multi-agent reinforcement learning methods:some studies have applied these methods to learning pricing strategies among multiple EV stations and charging operators. However,many studies do not seem to consider coordination among agents.Inaddition,some have attempted inter-agent coordination through federal reinforcement learning, but they assume that each agent is able to observe the entire network demand.

Based on these previous studies, the novelty of this paper is that it proposes a decentralized MARL method that can protect privacy at runtime while considering cooperation.

Proposed Method

In the proposed method, each EV user in the EV network has a reinforcement learning agent mounted on a smart meter, as shown in Figure 1. The network has two layers, consisting of a physical power layer and a control layer.

At the physical power layer, all EVs are connected to the upstream power grid (utility company) via a shared transformer. At the control layer, RL agents installed in each EV user's smart meter are responsible for efficiently managing and coordinating EV charging based on dynamic power prices and physical layer constraints (e.g., shared transformers).

As specific control strategies, this paper proposes two multi-agent DDPG methods, a centralized method (CTDE-DDPG) and a distributed method (I-DDPG).

Independent DDPG (I-DDPG)

- Completely decentralized approach.
- Each agent has its own actor-critic network and treats other agents as part of the environment.
- Low computational cost and small variance in policy gradients, but susceptible to non-stationarity.

Centralized Training Decentralized Execution DDPG (CTDE-DDPG)

- Cooperation among agents only at learning time, distributed at runtime
- Each agent shares a centralized value function, centralizing the clitic network
- Cooperation among agents can mitigate the effects of non-stationarity, but at high computational cost and with high policy gradient variance

In the CTDE-DDPG framework shown in Figure 2, information is shared among agents only during the learning phase, and each agent operates independently during the execution phase. During the learning phase, all agents have access to the observations and actions of all agents, but they do not have access to such information during the execution phase.Each agent has an actor-critic network, and in learning, actors choose actions based on local observations, which are evaluated by a centralized value function, the common critical network. On the other hand, at runtime, actors are decentralized and decide on actions based solely on local information.

Thus, in CTDE-DDPG, cooperation during learning mitigates non-stationarity among agents while protecting privacy during execution. In I-DDPG, on the other hand, agents learn and execute independently of each other.

Experiment

Experimental Setup

We simulated an EV network based on the IEEE 5-bus system and evaluated scenarios with up to 20 agents (EV users). The charging phase consists of 34 steps. Table I shows the hyperparameters of the DDPG.

General Performance

Figure 4 shows the average remaining battery capacity for the 10 agents. This figure shows that both methods meet the requirements of EV users.

Impact of The Cooperative Value Function

Figure 5 shows the average charging rates for the 10 agents: the I-DDPG exhibited an oscillatory charging pattern, while the CTDE-DDPG exhibited a smooth charging pattern. The total variation (referred to as TV), defined by equation (21), was about 36% smaller for CTDE-DDPG.

Figure 6 shows the average power price and Figure 7 shows the average charging cost per day. The higher the number of agents, the lower the price/lower the cost of CTDE-DDPG tended to be.

Convergence and Fairness

Figure 8 shows the average episode reward. Both methods converged to the same policy, but the variance tended to be larger.

Figure 9 shows the performance ratio (fairness) of the worst and best agents. As the number of agents increased, the fairness of I-DDPG decreased, while CTDE-DDPG maintained good fairness.

As in the theoretical analysis, the variance of the policy gradient was larger in CTDE-DDPG, but cooperative learning was able to mitigate the non-stationarity. This coordination contributed to smoothed charging patterns, stabilized prices, and improved fairness. CTDE-DDPG performed robustly even as the number of agents increased. These experimental results show that CTDE-DDPG is effective as a distributed and coordinated charging control method and can be applied to large-scale EV networks.

Conclusion

We propose a centralized and distributed multi-agent reinforcement learning method for EV network charging control and demonstrate its effectiveness both theoretically and experimentally. The proposed method can achieve efficient charging control based on cooperation while also preserving privacy during execution. We also confirmed that the proposed method works robustly in large EV networks.

Future applications to more complex environments and other control objectives should be considered. In addition, further refinement and analysis of the model is theoretically required.