|
Digital Library of the
European Council for Modelling and Simulation |
Title: |
Ethological Concepts In Hierarchical Reinforcement Learning And
Control Of Intelligent Agents |
Authors: |
Pavel Nahodil |
Published in: |
(2009).ECMS
2009 Proceedings edited by J. Otamendi, A. Bargiela, J. L. Montes, L. M. Doncel
Pedrera. European Council for Modeling and
Simulation. doi:10.7148/2009 ISBN: 978-0-9553018-8-9 23rd
European Conference on Modelling and Simulation, Madrid, June
9-12, 2009 |
Citation
format: |
Nahodil, P. (2009). Ethological Concepts
In Hierarchical Reinforcement Learning And Control Of Intelligent Agents.
ECMS 2009 Proceedings edited by J. Otamendi, A. Bargiela, J. L. Montes, L. M. Doncel Pedrera (pp.
180-186). European Council for Modeling and Simulation. doi:10.7148/2009-0180-0186 |
DOI: |
http://dx.doi.org/10.7148/2009-0180-0186 |
Abstract: |
This paper integrates rigorous methods of reinforcement
learning (RL) and control engineering with a behavioral (ethology)
approach to the agent technology. The main outcome is a
hybrid architecture for intelligent autonomous agents targeted to the
Artificial Life like environments. The architecture adopts several biology
concepts and shows that they can provide robust solutions to some areas. The
resulting agents perform from primitive behaviors, simple goal directed
behaviors, to complex planning. The agents are fully autonomous through
environment feedback evaluating internal agent state and motivate the agent
to perform behaviors that return the agent towards optimal conditions. This
principle is typical to animals. Learning and control is
realized by multiple RL controllers working in a hierarchy of Semi Markov
Decision Processes (SMDP). Used model free Q(λ ) learning works online, the agents gain experiences during interaction
with the environment. The decomposition of the root SMDP into hierarchy is
automated as opposed to the conventional methods that are manual. The agents
assess utility of the behavior and provide rewards to RL controller as
opposed to the conventional RL methods where the rewards
situations map is defined by the designer upfront. Agent behavior is
continuously optimized according to the distance from the agent’s optimal
conditions. |
Full
text: |