Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach

Delgado, Tomás - Sánchez Sorondo, Marco - Braberman, Víctor - Uchitel, Sebastián

Resumen:

In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.


Detalles Bibliográficos
2023
Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la Innovación
Universidad de Buenos Aires
Agencia Nacional de Investigación e Innovación
Artificial intelligence
Controller synthesis
Ciencias Naturales y Exactas
Ciencias de la Computación e Información
Ciencias de la Computación
Inglés
Agencia Nacional de Investigación e Innovación
REDI
https://hdl.handle.net/20.500.12381/3418
https://doi.org/10.48550/arXiv.2210.05393
Acceso abierto
Reconocimiento 4.0 Internacional. (CC BY)
Resumen:
Sumario:In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.