Digital Library

of the European Council for Modelling and Simulation

Title:	Ethological Concepts In Hierarchical Reinforcement Learning And Control Of Intelligent Agents
Authors:	Pavel Nahodil
Published in:	(2009).ECMS 2009 Proceedings edited by J. Otamendi, A. Bargiela, J. L. Montes, L. M. Doncel Pedrera. European Council for Modeling and Simulation. doi:10.7148/2009 ISBN: 978-0-9553018-8-9 23^rd European Conference on Modelling and Simulation, Madrid, June 9-12, 2009
Citation format:	Nahodil, P. (2009). Ethological Concepts In Hierarchical Reinforcement Learning And Control Of Intelligent Agents. ECMS 2009 Proceedings edited by J. Otamendi, A. Bargiela, J. L. Montes, L. M. Doncel Pedrera (pp. 180-186). European Council for Modeling and Simulation. doi:10.7148/2009-0180-0186
DOI:	http://dx.doi.org/10.7148/2009-0180-0186
Abstract:	This paper integrates rigorous methods of reinforcement learning (RL) and control engineering with a behavioral (ethology) approach to the agent technology. The main outcome is a hybrid architecture for intelligent autonomous agents targeted to the Artificial Life like environments. The architecture adopts several biology concepts and shows that they can provide robust solutions to some areas. The resulting agents perform from primitive behaviors, simple goal directed behaviors, to complex planning. The agents are fully autonomous through environment feedback evaluating internal agent state and motivate the agent to perform behaviors that return the agent towards optimal conditions. This principle is typical to animals. Learning and control is realized by multiple RL controllers working in a hierarchy of Semi Markov Decision Processes (SMDP). Used model free Q(λ ) learning works online, the agents gain experiences during interaction with the environment. The decomposition of the root SMDP into hierarchy is automated as opposed to the conventional methods that are manual. The agents assess utility of the behavior and provide rewards to RL controller as opposed to the conventional RL methods where the rewards situations map is defined by the designer upfront. Agent behavior is continuously optimized according to the distance from the agent’s optimal conditions.
Full text: