経験に固執しない Profit Sharing 法

Transactions of the Japanese Society for Artificial Intelligence 21:81-93 (2006)
  Copy   BIBTEX

Abstract

Profit Sharing is one of the reinforcement learning methods. An agent, as a learner, selects an action with a state-action value and receives rewards when it reaches a goal state. Then it distributes receiving rewards to state-action values. This paper discusses how to set the initial value of a state-action value. A distribution function ƒ( x ) is called as the reinforcement function. On Profit Sharing, an agent learns a policy by distributing rewards with the reinforcement function. On Markov Decision Processes (MDPs), the reinforcement function ƒ( x ) = 1/ L x is useful, and on Partially Observable Markov Decision Processes (POMDPs), ƒ( x ) = 1/ L w is useful, where L is the sufficient number of rules at each state, and W is the length of an episode. If episodes are always long, the value of the reinforcement function is little. So the differences of rule values become little, and the agent learns little by using the roulette selection as an action selection. This problem is called as Learning Speed Problem. If the value of the reinforcement function for an action is very higher than its state-action value, an agent will not select other action. There is a problem when its action is not a optimal action. This problem is called as Past Experiences Problem. This paper shows that both Learning Speed Problem and Past Experiences Problem are caused by the bad setting between the initial values of a state-action values and the function values of a reinforcement function. We propose how to set the initial values of a state-action values at each state. The experiment shows that an agent can learn correctly even if the length of episode is large. And shows the effectiveness on both MDPs and POMDPs. Our proposed method focuses on the initialization of state-action values and does not limit reinforcement functions. So it can apply to any reinforcement function.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 100,561

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Profit Sharing 法における強化関数に関する一考察.Tatsumi Shoji Uemura Wataru - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:197-203.
強化学習を用いた自律移動型ロボットの行動計画法の提案.五十嵐 治一 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:501-509.
罰を回避する合理的政策の学習.坪井 創吾 宮崎 和光 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16 (2):185-192.
不完全知覚判定法を導入した Profit Sharing.Masuda Shiro Saito Ken - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:379-388.
尤度情報に基づく温度分布を用いた強化学習法.鈴木 健嗣 小堀 訓成 - 2005 - Transactions of the Japanese Society for Artificial Intelligence 20:297-305.
Κ-確実探査法と動的計画法を用いた mdps 環境の効率的探索法.Kawada Seiichi Tateyama Takeshi - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:11-19.
合理的政策形成アルゴリズムの連続値入力への拡張.木村 元 宮崎 和光 - 2007 - Transactions of the Japanese Society for Artificial Intelligence 22 (3):332-341.

Analytics

Added to PP
2014-03-19

Downloads
24 (#884,466)

6 months
4 (#1,227,078)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references