[FlagVNE] A Flexible And Generalizable Reinforcement Learning Framework For Virtual Network Embedding

Networking And Internet Architecture 04/10/2024

3 main points
✔️ Bidirectional action-based MDP modeling for VNE improves solution space exploration.
✔️ Hierarchical policy architecture for adaptive action probability distribution generation and high learning efficiency.
✔️ Meta-reinforcement learning-based learning methods and curriculum scheduling strategies efficiently learn multiple size-specific policies and quickly adapt to unknown distributions.

FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation
written by Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong
(Submitted on 19 Apr 2024)
Comments: Accepted by IJCAI 2024
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

Network virtualization (NV) is an innovative technology that is gaining attention in areas such as 5G networking and cloud computing. NV enables the placement of multiple user-submitted virtual network requests (VNRs) on the same physical network through network slicing and shared infrastructure to meet a wide variety of network service requirements.

However, at the heart of this fascinating technology, virtual network embedding (VNE) is a very challenging combinatorial optimization problem: VNE requires dealing with a huge combinatorial explosion and differentiated demands. While the solution space is vast, the specific requirements of user services dynamically change the integration of different VNR topologies and their associated resource demands.

Recently, reinforcement learning (RL) has emerged as a promising solution to the VNE problem. However, existing RL-based VNE methods are currently limited in their explorability and versatility due to limitations caused by unidirectional action design and one-size-fits-all learning strategies.

In this paper, we propose a new flexible and versatile RL framework for VNE, FlagVNE, which aims to increase the explorability of the solution space, learn specialized policies for VNRs of different sizes, and achieve rapid adaptation to unknown distributions . This innovative approach will open up new possibilities for VNE in complex network environments.

Related Research

Related studies fall into two categories: traditional and learning-based methods.

Traditional Methods

- Early VNE problems were tackled with rigorous methods such as integer linear programming, which proved to be impractical in real large-scale scenarios.
- Heuristic algorithms such as node ranking were then proposed. While these algorithms can find solutions in acceptable time, they rely heavily on manual design and are often tailored to specific scenarios, limiting their performance in the general case.

Learning-Based Methods

- Recently, machine learning techniques have been used to solve VNE, leading to faster and more efficient solutions.
- In particular, reinforcement learning (RL) has shown great potential as an intelligent decision-making framework that can effectively solve VNE through MDP modeling.
- The authors unify existing RL-based methods into a general framework consisting of three key elements: MDP modeling, policy architecture, and learning methods.
- These methods model the VNE solution construction process as a unidirectional action-based MDP, build policy models with various neural networks, and learn a single general policy for handling VNRs of various sizes.
- However, these existing methods suffer from limited explorability and generality due to the unidirectional action design and one-size-fits-all learning policy, which affects overall performance.Figure 1 shows an example of a VNE problem with a multidimensional resource. It depicts multiple VNRs being mapped to a physical network, and shows that there are two phases: node mapping and link mapping.

Proposed Method (FlagVNE)

Figure 2 provides an overview of the FlagVNE framework. (a) shows the generic learning method and (b) shows the interactive action-based MDP modeling and hierarchical policy architecture.The main components of the proposed method are.

1. two-way action-based MDP modeling (Figure 2(b)):
- Proposes a new MDP model that allows simultaneous selection of virtual and physical nodes.
- This increases the flexibility of agent exploration and utilization, and increases the explorability of the solution space.
- A hierarchical decoder and two-layer policy are designed to deal with large and changing action spaces.

2. hierarchical policy architecture (Figure 2(b)):
- decomposed into two aspects: ordering of virtual nodes and placement of physical nodes.
- Designed a hierarchical decoder with two-tier policies (high-level ordering policy and low-level placement policy).
- This allows for adaptive action probability distribution generation and high learning efficiency.

3. versatile meta-reinforcement learning based learning method (Figure 2(a)):
- Proposed method to efficiently learn multiple size-specific policies and quickly adapt to new sizes.
- After learning a meta-policy, quickly fine-tune size-specific policies specific to each VNR size (including unknown sizes).
- Gradually incorporate large VNRs using a curriculum scheduling strategy to reduce partially optimal convergence.

Experiment

Figure 3 shows the performance of all algorithms at various traffic throughputs: as the VNR arrival rate increases, the RAC decreases for all algorithms, but FlagVNE always achieves the best performance. The improvement of FlagVNE is especially noticeable when competition for resources is strong.

Table 1 presents the results of the ablation study, showing that each component of FlagVNE contributes to the final performance gains. In particular, the effectiveness of the meta-reinforcement learning-based learning approach and the curriculum scheduling strategy is demonstrated. These results demonstrate that FlagVNE performs well in improving explorability and versatility.

Conclusion

In this paper, we propose a new RL framework for VNE, FlagVNE, to improve its explorability and generality. Experimental results show that FlagVNE is useful for effective resource allocation in complex network environments. In the future, it is expected that FlagVNE will be applied to larger and more dynamic network scenarios to validate its effectiveness. Another interesting direction is to apply FlagVNE to other resource management problems.

Categories related to this article

Sasayama