Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
Resumen:
In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.
2023 | |
Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la Innovación Universidad de Buenos Aires Agencia Nacional de Investigación e Innovación |
|
Artificial intelligence Controller synthesis Ciencias Naturales y Exactas Ciencias de la Computación e Información Ciencias de la Computación |
|
Inglés | |
Agencia Nacional de Investigación e Innovación | |
REDI | |
https://hdl.handle.net/20.500.12381/3418
https://doi.org/10.48550/arXiv.2210.05393 |
|
Acceso abierto | |
Reconocimiento 4.0 Internacional. (CC BY) |
Sumario: | In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training. |
---|