Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach

Delgado, Tomás - Sánchez Sorondo, Marco - Braberman, Víctor - Uchitel, Sebastián

Resumen:

In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.


Detalles Bibliográficos
2023
Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la Innovación
Universidad de Buenos Aires
Agencia Nacional de Investigación e Innovación
Artificial intelligence
Controller synthesis
Ciencias Naturales y Exactas
Ciencias de la Computación e Información
Ciencias de la Computación
Inglés
Agencia Nacional de Investigación e Innovación
REDI
https://hdl.handle.net/20.500.12381/3418
https://doi.org/10.48550/arXiv.2210.05393
Acceso abierto
Reconocimiento 4.0 Internacional. (CC BY)
_version_ 1814959261456793600
author Delgado, Tomás
author2 Sánchez Sorondo, Marco
Braberman, Víctor
Uchitel, Sebastián
author2_role author
author
author
author_facet Delgado, Tomás
Sánchez Sorondo, Marco
Braberman, Víctor
Uchitel, Sebastián
author_role author
bitstream.checksum.fl_str_mv a4ce09f01b5dd771727aa05c73851623
6b381d93f78604b15ee2191d20cea949
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
bitstream.url.fl_str_mv https://redi.anii.org.uy/jspui/bitstream/20.500.12381/3418/2/license.txt
https://redi.anii.org.uy/jspui/bitstream/20.500.12381/3418/1/2210.05393v2.pdf
collection REDI
dc.creator.none.fl_str_mv Delgado, Tomás
Sánchez Sorondo, Marco
Braberman, Víctor
Uchitel, Sebastián
dc.date.accessioned.none.fl_str_mv 2024-02-16T13:30:44Z
dc.date.available.none.fl_str_mv 2024-02-16T13:30:44Z
dc.date.issued.none.fl_str_mv 2023-07-08
dc.description.abstract.none.fl_txt_mv In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.
dc.description.sponsorship.none.fl_txt_mv Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la Innovación
Universidad de Buenos Aires
Agencia Nacional de Investigación e Innovación
dc.identifier.anii.es.fl_str_mv IA_1_2022_1_173516
dc.identifier.doi.none.fl_str_mv https://doi.org/10.48550/arXiv.2210.05393
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12381/3418
dc.language.iso.none.fl_str_mv eng
dc.relation.uri.es.fl_str_mv https://hdl.handle.net/20.500.12381/3417
dc.relation.uri.none.fl_str_mv https://hdl.handle.net/20.500.12381/3419
https://hdl.handle.net/20.500.12381/3420
dc.rights.*.fl_str_mv Acceso abierto
dc.rights.license.none.fl_str_mv Reconocimiento 4.0 Internacional. (CC BY)
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.source.es.fl_str_mv 33rd International Conference on Automated Planning and Scheduling. Prague, Czech Republic. 2023
dc.source.none.fl_str_mv reponame:REDI
instname:Agencia Nacional de Investigación e Innovación
instacron:Agencia Nacional de Investigación e Innovación
dc.subject.anii.none.fl_str_mv Ciencias Naturales y Exactas
Ciencias de la Computación e Información
Ciencias de la Computación
dc.subject.es.fl_str_mv Artificial intelligence
Controller synthesis
dc.title.none.fl_str_mv Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
dc.type.es.fl_str_mv Documento de conferencia
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
dc.type.version.es.fl_str_mv Enviado
dc.type.version.none.fl_str_mv info:eu-repo/semantics/submittedVersion
description In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.
eu_rights_str_mv openAccess
format conferenceObject
id REDI_53f26dd9a204d115d08f70a57d5ff069
identifier_str_mv IA_1_2022_1_173516
instacron_str Agencia Nacional de Investigación e Innovación
institution Agencia Nacional de Investigación e Innovación
instname_str Agencia Nacional de Investigación e Innovación
language eng
network_acronym_str REDI
network_name_str REDI
oai_identifier_str oai:redi.anii.org.uy:20.500.12381/3418
publishDate 2023
reponame_str REDI
repository.mail.fl_str_mv jmaldini@anii.org.uy
repository.name.fl_str_mv REDI - Agencia Nacional de Investigación e Innovación
repository_id_str 9421
rights_invalid_str_mv Reconocimiento 4.0 Internacional. (CC BY)
Acceso abierto
spelling Reconocimiento 4.0 Internacional. (CC BY)Acceso abiertoinfo:eu-repo/semantics/openAccess2024-02-16T13:30:44Z2024-02-16T13:30:44Z2023-07-08https://hdl.handle.net/20.500.12381/3418IA_1_2022_1_173516https://doi.org/10.48550/arXiv.2210.05393In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la InnovaciónUniversidad de Buenos AiresAgencia Nacional de Investigación e Innovaciónenghttps://hdl.handle.net/20.500.12381/3417https://hdl.handle.net/20.500.12381/3419https://hdl.handle.net/20.500.12381/342033rd International Conference on Automated Planning and Scheduling. Prague, Czech Republic. 2023reponame:REDIinstname:Agencia Nacional de Investigación e Innovacióninstacron:Agencia Nacional de Investigación e InnovaciónArtificial intelligenceController synthesisCiencias Naturales y ExactasCiencias de la Computación e InformaciónCiencias de la ComputaciónExploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning ApproachDocumento de conferenciaEnviadoinfo:eu-repo/semantics/submittedVersioninfo:eu-repo/semantics/conferenceObjectUniversidad de Buenos Aires.//Ciencias Naturales y Exactas/Ciencias de la Computación e Información/Ciencias de la ComputaciónDelgado, TomásSánchez Sorondo, MarcoBraberman, VíctorUchitel, SebastiánLICENSElicense.txtlicense.txttext/plain; charset=utf-84967https://redi.anii.org.uy/jspui/bitstream/20.500.12381/3418/2/license.txta4ce09f01b5dd771727aa05c73851623MD52ORIGINAL2210.05393v2.pdf2210.05393v2.pdfapplication/pdf1363270https://redi.anii.org.uy/jspui/bitstream/20.500.12381/3418/1/2210.05393v2.pdf6b381d93f78604b15ee2191d20cea949MD5120.500.12381/34182024-02-16 10:39:59.608oai:redi.anii.org.uy:20.500.12381/3418PHA+PGI+QUNVRVJETyBERSBDRVNJT04gTk8gRVhDTFVTSVZBIERFIERFUkVDSE9TPC9iPjwvcD4NCg0KPHA+QWNlcHRhbmRvIGxhIGNlc2nDs24gZGUgZGVyZWNob3MgZWwgdXN1YXJpbyBERUNMQVJBIHF1ZSBvc3RlbnRhIGxhIGNvbmRpY2nDs24gZGUgYXV0b3IgZW4gZWwgc2VudGlkbyBxdWUgb3RvcmdhIGxhIGxlZ2lzbGFjacOzbiB2aWdlbnRlIHNvYnJlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIG9yaWdpbmFsIHF1ZSBlc3TDoSBlbnZpYW5kbyAo4oCcbGEgb2JyYeKAnSkuIEVuIGNhc28gZGUgc2VyIGNvdGl0dWxhciwgZWwgYXV0b3IgZGVjbGFyYSBxdWUgY3VlbnRhIGNvbiBlbCAgY29uc2VudGltaWVudG8gZGUgbG9zIHJlc3RhbnRlcyB0aXR1bGFyZXMgcGFyYSBoYWNlciBsYSBwcmVzZW50ZSBjZXNpw7NuLiBFbiBjYXNvIGRlIHByZXZpYSBjZXNpw7NuIGRlIGxvcyBkZXJlY2hvcyBkZSBleHBsb3RhY2nDs24gc29icmUgbGEgb2JyYSBhIHRlcmNlcm9zLCBlbCBhdXRvciBkZWNsYXJhIHF1ZSB0aWVuZSBsYSBhdXRvcml6YWNpw7NuIGV4cHJlc2EgZGUgZGljaG9zIHRpdHVsYXJlcyBkZSBkZXJlY2hvcyBhIGxvcyBmaW5lcyBkZSBlc3RhIGNlc2nDs24sIG8gYmllbiBxdWUgaGEgY29uc2VydmFkbyBsYSBmYWN1bHRhZCBkZSBjZWRlciBlc3RvcyBkZXJlY2hvcyBlbiBsYSBmb3JtYSBwcmV2aXN0YSBlbiBsYSBwcmVzZW50ZSBjZXNpw7NuLjwvcD4NCg0KPHA+Q29uIGVsIGZpbiBkZSBkYXIgbGEgbcOheGltYSBkaWZ1c2nDs24gYSBsYSBvYnJhIGEgdHJhdsOpcyBkZWwgcmVwb3NpdG9yaW8gZGUgYWNjZXNvIGFiaWVydG8gUkVESSAoaHR0cHM6Ly9yZWRpLmFuaWkub3JnLnV5KSwgZWwgQVVUT1IgQ0VERSBhIDxiPkFnZW5jaWEgTmFjaW9uYWwgZGUgSW52ZXN0aWdhY2nDs24gZSBJbm5vdmFjacOzbjwvYj4gKDxiPkFOSUk8L2I+KSwgZGUgZm9ybWEgZ3JhdHVpdGEgeSBOTyBFWENMVVNJVkEsIGNvbiBjYXLDoWN0ZXIgaXJyZXZvY2FibGUgZSBpbGltaXRhZG8gZW4gZWwgdGllbXBvIHkgY29uIMOhbWJpdG8gbXVuZGlhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRlIGRpc3RyaWJ1Y2nDs24sIGRlIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgcGFyYSBxdWUgcHVlZGEgc2VyIHV0aWxpemFkYSBkZSBmb3JtYSBsaWJyZSB5IGdyYXR1aXRhIHBvciB0b2RvcyBsb3MgcXVlIGxvIGRlc2Vlbi48L3A+DQoNCjxwPkxhIGNlc2nDs24gc2UgcmVhbGl6YSBiYWpvIGxhcyBzaWd1aWVudGVzIGNvbmRpY2lvbmVzOjwvcD4NCg0KPHA+TGEgdGl0dWxhcmlkYWQgZGUgbGEgb2JyYSBzZWd1aXLDoSBjb3JyZXNwb25kaWVuZG8gYWwgQXV0b3IgeSBsYSBwcmVzZW50ZSBjZXNpw7NuIGRlIGRlcmVjaG9zIHBlcm1pdGlyw6EgYSA8Yj5BTklJPC9iPjo8L3A+DQoNCjx1bD4NCjxsaSB2YWx1ZT0oYSk+VHJhbnNmb3JtYXIgbGEgb2JyYSBlbiBsYSBtZWRpZGEgZW4gcXVlIHNlYSBuZWNlc2FyaW8gcGFyYSBhZGFwdGFybGEgYSBjdWFscXVpZXIgdGVjbm9sb2fDrWEgc3VzY2VwdGlibGUgZGUgaW5jb3Jwb3JhY2nDs24gYSBJbnRlcm5ldDsgcmVhbGl6YXIgbGFzIGFkYXB0YWNpb25lcyBuZWNlc2FyaWFzIHBhcmEgaGFjZXIgcG9zaWJsZSBzdSBhY2Nlc28geSB2aXN1YWxpemFjacOzbiBwZXJtYW5lbnRlLCBhw7puIHBvciBwYXJ0ZSBkZSBwZXJzb25hcyBjb24gZGlzY2FwYWNpZGFkLCByZWFsaXphciBsYXMgbWlncmFjaW9uZXMgZGUgZm9ybWF0b3MgcGFyYSBhc2VndXJhciBsYSBwcmVzZXJ2YWNpw7NuIGEgbGFyZ28gcGxhem8sIGluY29ycG9yYXIgbG9zIG1ldGFkYXRvcyBuZWNlc2FyaW9zIHBhcmEgcmVhbGl6YXIgZWwgcmVnaXN0cm8gZGUgbGEgb2JyYSwgZSBpbmNvcnBvcmFyIHRhbWJpw6luIOKAnG1hcmNhcyBkZSBhZ3Vh4oCdIG8gY3VhbHF1aWVyIG90cm8gc2lzdGVtYSBkZSBzZWd1cmlkYWQgbyBkZSBwcm90ZWNjacOzbiBvIGRlIGlkZW50aWZpY2FjacOzbiBkZSBwcm9jZWRlbmNpYS4gRW4gbmluZ8O6biBjYXNvIGRpY2hhcyBtb2RpZmljYWNpb25lcyBpbXBsaWNhcsOhbiBhZHVsdGVyYWNpb25lcyBlbiBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS48L2xpPiANCjxsaSB2YWx1ZT0oYik+UmVwcm9kdWNpciBsYSBvYnJhIGVuIHVuIG1lZGlvIGRpZ2l0YWwgcGFyYSBzdSBpbmNvcnBvcmFjacOzbiBhIHNpc3RlbWFzIGRlIGLDunNxdWVkYSB5IHJlY3VwZXJhY2nDs24sIGluY2x1eWVuZG8gZWwgZGVyZWNobyBhIHJlcHJvZHVjaXIgeSBhbG1hY2VuYXJsYSBlbiBzZXJ2aWRvcmVzIHUgb3Ryb3MgbWVkaW9zIGRpZ2l0YWxlcyBhIGxvcyBlZmVjdG9zIGRlIHNlZ3VyaWRhZCB5IHByZXNlcnZhY2nDs24uPC9saT4gDQo8bGkgdmFsdWU9KGMpPlBlcm1pdGlyIGEgbG9zIHVzdWFyaW9zIGxhIGRlc2NhcmdhIGRlIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLjwvbGk+IA0KPGxpIHZhbHVlPShkKT5SZWFsaXphciBsYSBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGRlIGxhIG9icmEgYWNjZXNpYmxlIGRlIG1vZG8gbGlicmUgeSBncmF0dWl0byBhIHRyYXbDqXMgZGUgSW50ZXJuZXQuDQo8L3VsPg0KDQo8cD5FbiB2aXJ0dWQgZGVsIGNhcsOhY3RlciBubyBleGNsdXNpdm8gZGUgbGEgY2VzacOzbiwgZWwgQXV0b3IgY29uc2VydmEgdG9kb3MgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIHNvYnJlIGxhIG9icmEsIHkgcG9kcsOhIHBvbmVybGEgYSBkaXNwb3NpY2nDs24gZGVsIHDDumJsaWNvIGVuIGVzdGEgeSBlbiBwb3N0ZXJpb3JlcyB2ZXJzaW9uZXMsIGEgdHJhdsOpcyBkZSBsb3MgbWVkaW9zIHF1ZSBlc3RpbWUgb3BvcnR1bm9zLjwvcD4NCg0KPHA+RWwgQXV0b3IgZGVjbGFyYSBiYWpvIGp1cmFtZW50byBxdWUgbGEgcHJlc2VudGUgY2VzacOzbiBubyBpbmZyaW5nZSBuaW5nw7puIGRlcmVjaG8gZGUgdGVyY2Vyb3MsIHlhIHNlYW4gZGUgcHJvcGllZGFkIGluZHVzdHJpYWwsIGludGVsZWN0dWFsIG8gY3VhbHF1aWVyIG90cm8geSBnYXJhbnRpemEgcXVlIGVsIGNvbnRlbmlkbyBkZSBsYSBvYnJhIG5vIGF0ZW50YSBjb250cmEgbG9zIGRlcmVjaG9zIGFsIGhvbm9yLCBhIGxhIGludGltaWRhZCB5IGEgbGEgaW1hZ2VuIGRlIHRlcmNlcm9zLCBuaSBlcyBkaXNjcmltaW5hdG9yaW8uIDxiPkFOSUk8L2I+IGVzdGFyw6EgZXhlbnRhIGRlIGxhIHJldmlzacOzbiBkZWwgY29udGVuaWRvIGRlIGxhIG9icmEsIHF1ZSBlbiB0b2RvIGNhc28gcGVybWFuZWNlcsOhIGJham8gbGEgcmVzcG9uc2FiaWxpZGFkIGV4Y2x1c2l2YSBkZWwgQXV0b3IuPC9wPg0KDQo8cD5MYSBvYnJhIHNlIHBvbmRyw6EgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIHBhcmEgcXVlIGhhZ2FuIGRlIGVsbGEgdW4gdXNvIGp1c3RvIHkgcmVzcGV0dW9zbyBkZSBsb3MgZGVyZWNob3MgZGVsIGF1dG9yIHkgY29uIGZpbmVzIGRlIGVzdHVkaW8sIGludmVzdGlnYWNpw7NuLCBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLiBFbCBtZW5jaW9uYWRvIHVzbywgbcOhcyBhbGzDoSBkZSBsYSBjb3BpYSBwcml2YWRhLCByZXF1ZXJpcsOhIHF1ZSBzZSBjaXRlIGxhIGZ1ZW50ZSB5IHNlIHJlY29ub3pjYSBsYSBhdXRvcsOtYS4gQSB0YWxlcyBmaW5lcyBlbCBBdXRvciBhY2VwdGEgZWwgdXNvIGRlIGxpY2VuY2lhcyBDcmVhdGl2ZSBDb21tb25zIHkgRUxJR0UgdW5hIGRlIGVzdGFzIGxpY2VuY2lhcyBlc3RhbmRhcml6YWRhcyBhIGxvcyBmaW5lcyBkZSBjb211bmljYXIgc3Ugb2JyYS48L3A+DQoNCjxwPkVsIEF1dG9yLCBjb21vIGdhcmFudGUgZGUgbGEgYXV0b3LDrWEgZGUgbGEgb2JyYSB5IGVuIHJlbGFjacOzbiBhIGxhIG1pc21hLCBkZWNsYXJhIHF1ZSA8Yj5BTklJPC9iPiBzZSBlbmN1ZW50cmEgbGlicmUgZGUgdG9kbyB0aXBvIGRlIHJlc3BvbnNhYmlsaWRhZCwgc2VhIMOpc3RhIGNpdmlsLCBhZG1pbmlzdHJhdGl2YSBvIHBlbmFsLCB5IHF1ZSDDqWwgbWlzbW8gYXN1bWUgbGEgcmVzcG9uc2FiaWxpZGFkIGZyZW50ZSBhIGN1YWxxdWllciByZWNsYW1vIG8gZGVtYW5kYSBwb3IgcGFydGUgZGUgdGVyY2Vyb3MuIDxiPkFOSUk8L2I+IGVzdGFyw6EgZXhlbnRhIGRlIGVqZXJjaXRhciBhY2Npb25lcyBsZWdhbGVzIGVuIG5vbWJyZSBkZWwgQXV0b3IgZW4gZWwgc3VwdWVzdG8gZGUgaW5mcmFjY2lvbmVzIGEgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlcml2YWRvcyBkZWwgZGVww7NzaXRvIHkgYXJjaGl2byBkZSBsYSBvYnJhLjwvcD4NCg0KPHA+PGI+QU5JSTwvYj4gbm90aWZpY2Fyw6EgYWwgQXV0b3IgZGUgY3VhbHF1aWVyIHJlY2xhbWFjacOzbiBxdWUgcmVjaWJhIGRlIHRlcmNlcm9zIGVuIHJlbGFjacOzbiBjb24gbGEgb2JyYSB5LCBlbiBwYXJ0aWN1bGFyLCBkZSByZWNsYW1hY2lvbmVzIHJlbGF0aXZhcyBhIGxvcyBkZXJlY2hvcyBkZSBwcm9waWVkYWQgaW50ZWxlY3R1YWwgc29icmUgZWxsYS48L3A+DQoNCjxwPkVsIEF1dG9yIHBvZHLDoSBzb2xpY2l0YXIgZWwgcmV0aXJvIG8gbGEgaW52aXNpYmlsaXphY2nDs24gZGUgbGEgb2JyYSBkZSBSRURJIHPDs2xvIHBvciBjYXVzYSBqdXN0aWZpY2FkYS4gQSB0YWwgZmluIGRlYmVyw6EgbWFuaWZlc3RhciBzdSB2b2x1bnRhZCBlbiBmb3JtYSBmZWhhY2llbnRlIHkgYWNyZWRpdGFyIGRlYmlkYW1lbnRlIGxhIGNhdXNhIGp1c3RpZmljYWRhLiBBc2ltaXNtbyA8Yj5BTklJPC9iPiBwb2Ryw6EgcmV0aXJhciBvIGludmlzaWJpbGl6YXIgbGEgb2JyYSBkZSBSRURJLCBwcmV2aWEgbm90aWZpY2FjacOzbiBhbCBBdXRvciwgZW4gc3VwdWVzdG9zIHN1ZmljaWVudGVtZW50ZSBqdXN0aWZpY2Fkb3MsIG8gZW4gY2FzbyBkZSByZWNsYW1hY2lvbmVzIGRlIHRlcmNlcm9zLjwvcD4=Gobiernohttps://www.anii.org.uy/https://redi.anii.org.uy/oai/requestjmaldini@anii.org.uyUruguayopendoar:94212024-02-16T13:39:59REDI - Agencia Nacional de Investigación e Innovaciónfalse
spellingShingle Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
Delgado, Tomás
Artificial intelligence
Controller synthesis
Ciencias Naturales y Exactas
Ciencias de la Computación e Información
Ciencias de la Computación
status_str submittedVersion
title Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
title_full Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
title_fullStr Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
title_full_unstemmed Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
title_short Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
title_sort Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
topic Artificial intelligence
Controller synthesis
Ciencias Naturales y Exactas
Ciencias de la Computación e Información
Ciencias de la Computación
url https://hdl.handle.net/20.500.12381/3418
https://doi.org/10.48550/arXiv.2210.05393