COAgents: Multi-Agent Framework Masters VRP Search Space

Summary

Vehicle Routing Problems (VRP) are fundamental to logistics and supply chain management but remain computationally intractable at scale. Traditional heuristics rely on handcrafted rules and struggle to generalize across diverse instances. COAgents introduces a cooperative multi-agent framework that models the search process as a dynamically constructed graph, where nodes represent candidate solutions and edges represent either local refinements or large perturbations (jumps). Three specialized agents—Node Selection, Move Selection, and Jump—collaborate to balance intensification and exploration, learning to navigate the search space efficiently.

The framework achieves state-of-the-art results among learning-based methods on the more challenging VRPTW benchmarks, reducing the gap to best-known solutions by 14% at N=100 and 44% at N=50 relative to the strongest neural solver (POMO), and by 21% and 40% respectively relative to ALNS. On CVRP, COAgents remains competitive with existing learn-to-search baselines. The clean separation of problem-agnostic search control from compact domain-specific encoding facilitates adaptability across tasks, making the framework a versatile tool for combinatorial optimization.

Key Contributions

Multi-agent cooperative search: Introduces three specialized agents (Node Selection, Move Selection, Jump) that collaborate to guide the search process, balancing local refinement with global exploration.
Dynamic Partial Search Graph (PSG): Models the search space as a graph constructed on-the-fly, enabling efficient learning and navigation without requiring full enumeration.
State-of-the-art on VRPTW: Achieves new best results among learning-based methods, significantly reducing the gap to optimal solutions on challenging benchmarks.
Modular and adaptable design: Separates search control from domain encoding, allowing easy adaptation to different VRP variants and potentially other combinatorial problems.
Open-source implementation: Code available on GitHub, enabling reproduction and extension by the research community.

Implications

For Researchers

COAgents provides a new paradigm for combining reinforcement learning with search heuristics in combinatorial optimization. The multi-agent architecture with clear role separation offers a template for tackling other hard optimization problems. Researchers can build upon this framework to explore more sophisticated agent coordination strategies, such as hierarchical or meta-learning approaches for search control.

For Developers

The modular design of COAgents makes it a practical template for building agent-based solvers for logistics, scheduling, and routing applications. Developers can adapt the framework to specific domain requirements by swapping the domain-specific encoding while retaining the core search control logic. The open-source codebase provides a solid foundation for integration into real-world systems.

For Users

End users in logistics and supply chain management benefit from improved solution quality on complex routing problems. The significant reduction in the gap to optimal solutions on VRPTW benchmarks translates to more efficient route planning, reduced operational costs, and better resource utilization in real-world applications.

COAgents: Multi-Agent Framework Masters VRP Search Space

Read this first.

Where this changes the map.

Translated text.