IMI/Publicaţii/CSJM/Ediţii/CSJM v.10, n.2 (29), 2002/

The Analysis of Experimental Results of Reinforcement Learning Systems

Authors: Jaroslav E. Poliscuk
Keywords: Algorithm TD(0), algorithm TD(Lambda), Bellman equation, Markov decision making process, mechanism of eligibility traces, method of temporal difference learning, reinforcement learning method.


In this article a reinforcement learning method is analyzed, in which a subject of learning is defined. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basis of the present known conditions and actions, relatively to the Markov decision making process. The relationship between the present conditions and values and the potential future conditions are defined by the Bellman equation. Also, the article discussed a method of temporal difference learning, mechanism of eligibility traces, as well as theirs algorithms TD(0) and TD(Lambda). Theoretical analysis were supplemented by the practical studies, with reference to implementation of the Sarsa(Lambda) algorithm, with replacing eligibility traces and the Epsilon greedy policy.

Dr. Jaroslav E. Poliscuk,
Department of Electrical Engineering Podgorica,
University of Montenegro, Yugoslavia


Adobe PDF document0.24 Mb