学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案

前田, 康成; 浮田, 善文; 松嶋, 敏泰; 平澤, 茂一; MAEDA, Yasunari; UKITA, Yoshihumi; MATSUSHIMA, Toshiyasu; HIRASAWA, Shigeichi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "3cbcac06-23e9-4d56-aad6-ea58a7f4b32a"}, "_deposit": {"id": "8954", "owners": [], "pid": {"revision_id": 0, "type": "depid", "value": "8954"}, "status": "published"}, "_oai": {"id": "oai:kitami-it.repo.nii.ac.jp:00008954", "sets": ["86"]}, "author_link": ["273", "90493", "90494", "90495", "90496", "90497", "90498", "90499"], "item_1646810750418": {"attribute_name": "出版タイプ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_3_alternative_title_198": {"attribute_name": "その他のタイトル", "attribute_value_mlt": [{"subitem_alternative_title": "The Optimal Algorithms for the Reinforcement Learning Problem Separated into a Learning Period and a Control Period", "subitem_alternative_title_language": "en"}]}, "item_3_biblio_info_186": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "1998-04-15", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "4", "bibliographicPageEnd": "1126", "bibliographicPageStart": "1116", "bibliographicVolumeNumber": "39", "bibliographic_titles": [{"bibliographic_title": "情報処理学会論文誌"}, {"bibliographic_title": "Transactions of Information Processing Society of Japan", "bibliographic_titleLang": "en"}]}]}, "item_3_description_184": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "本研究では，遷移確率行列が未知であるようなマルコフ決定過程によってモデル化されている，学習期間と制御期間に分割された強化学習問題における，最適アルゴリズムの提案を行っている．従来研究では，真の遷移確率行列を同定できれば制御期間の収益を最大化できるため，学習期間の目的を単に未知の遷移確率行列の推定としているが，有限の学習期間のもとでは推定誤差があるため，収益最大化の厳密な保証はない．そこで本研究では，有限の学習期間と有限の制御期間の強化学習問題において，制御期間の収益をベイズ基準のもとで最大化する基本最適アルゴリズムを提案する．しかし，基本最適アルゴリズムの計算量が指数オーダーのため，さらにその改良を行い，改良最適アルゴリズムを提案する．改良最適アルゴリズムは基本最適アルゴリズム同様に収益をベイズ基準のもとで最大化することができ，かつその計算量は多項式オーダーに軽減されている．", "subitem_description_type": "Abstract"}, {"subitem_description": "[ENG]\nIn this paper,new algorithms are proposed based on statistical decision theory in the field of Markov decision processes under the condition that a transition probability matrix is unknown.In previous researches on RL(reinforcement learning),learning is based on only the estimation of an unknown transition probability matrix and the maximum reward is not received in a finite period,though their purpose is to maximize a reward.In our algorithms it is possible to maximize the reward within a finite period with respect to Bayes criterion.Moreover, we propose some techniques to reduce the computational complexity of our algorithm from exponential order to polynomial order", "subitem_description_type": "Abstract"}]}, "item_3_publisher_212": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "情報処理学会"}]}, "item_3_relation_208": {"attribute_name": "論文ID（NAID）", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "110002722119", "subitem_relation_type_select": "NAID"}}]}, "item_3_select_195": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_select_item": "publisher"}]}, "item_3_source_id_187": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "1882-7764", "subitem_source_identifier_type": "PISSN"}]}, "item_3_source_id_189": {"attribute_name": "書誌レコードID", "attribute_value_mlt": [{"subitem_source_identifier": "AN00116647", "subitem_source_identifier_type": "NCID"}]}, "item_access_right": {"attribute_name": "アクセス権", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "前田, 康成", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "273", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "30422033", "nameIdentifierScheme": "KAKEN - 研究者検索", "nameIdentifierURI": "https://nrid.nii.ac.jp/ja/nrid/1000030422033/"}]}, {"creatorNames": [{"creatorName": "浮田, 善文", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "90493", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "松嶋, 敏泰", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "90494", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "平澤, 茂一", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "90495", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MAEDA, Yasunari", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90496", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "UKITA, Yoshihumi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90497", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MATSUSHIMA, Toshiyasu", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90498", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "HIRASAWA, Shigeichi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90499", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2021-01-20"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "情報処理学会論文誌, 39(4), pp.1116-1126.pdf", "filesize": [{"value": "1.3 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 1300000.0, "url": {"label": "情報処理学会論文誌, 39(4), pp.1116-1126", "url": "https://kitami-it.repo.nii.ac.jp/record/8954/files/情報処理学会論文誌, 39(4), pp.1116-1126.pdf"}, "version_id": "3a2e7452-3188-4193-af96-5dd34e88cf82"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "jpn"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案", "subitem_title_language": "ja"}, {"subitem_title": "The Optimal Algorithms for the Reinforcement Learning Problem Separated into a Learning Period and a Control Period", "subitem_title_language": "en"}]}, "item_type_id": "3", "owner": "1", "path": ["86"], "permalink_uri": "https://kitami-it.repo.nii.ac.jp/records/8954", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2021-01-20"}, "publish_date": "2021-01-20", "publish_status": "0", "recid": "8954", "relation": {}, "relation_version_is_last": true, "title": ["学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案"], "weko_shared_id": -1}

学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案

https://kitami-it.repo.nii.ac.jp/records/8954

名前 / ファイル	ライセンス	アクション
情報処理学会論文誌, 39(4), pp.1116-1126 (1.3 MB)

Item type

学術雑誌論文 / Journal Article(1)

公開日

2021-01-20

タイトル

言語

タイトル

学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案

タイトル

言語

タイトル

The Optimal Algorithms for the Reinforcement Learning Problem Separated into a Learning Period and a Control Period

言語

jpn

資源タイプ

資源

http://purl.org/coar/resource_type/c_6501

タイプ

journal article

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

その他のタイトル

The Optimal Algorithms for the Reinforcement Learning Problem Separated into a Learning Period and a Control Period

言語

著者

前田, 康成

WEKO 273
KAKEN - 研究者検索 30422033

ja	前田, 康成

Search repository

浮田, 善文
松嶋, 敏泰
平澤, 茂一
MAEDA, Yasunari
UKITA, Yoshihumi
MATSUSHIMA, Toshiyasu

HIRASAWA, Shigeichi

抄録

内容記述タイプ

Abstract

内容記述

本研究では，遷移確率行列が未知であるようなマルコフ決定過程によってモデル化されている，学習期間と制御期間に分割された強化学習問題における，最適アルゴリズムの提案を行っている．従来研究では，真の遷移確率行列を同定できれば制御期間の収益を最大化できるため，学習期間の目的を単に未知の遷移確率行列の推定としているが，有限の学習期間のもとでは推定誤差があるため，収益最大化の厳密な保証はない．そこで本研究では，有限の学習期間と有限の制御期間の強化学習問題において，制御期間の収益をベイズ基準のもとで最大化する基本最適アルゴリズムを提案する．しかし，基本最適アルゴリズムの計算量が指数オーダーのため，さらにその改良を行い，改良最適アルゴリズムを提案する．改良最適アルゴリズムは基本最適アルゴリズム同様に収益をベイズ基準のもとで最大化することができ，かつその計算量は多項式オーダーに軽減されている．

抄録

内容記述タイプ

Abstract

内容記述

[ENG]
In this paper,new algorithms are proposed based on statistical decision theory in the field of Markov decision processes under the condition that a transition probability matrix is unknown.In previous researches on RL(reinforcement learning),learning is based on only the estimation of an unknown transition probability matrix and the maximum reward is not received in a finite period,though their purpose is to maximize a reward.In our algorithms it is possible to maximize the reward within a finite period with respect to Bayes criterion.Moreover, we propose some techniques to reduce the computational complexity of our algorithm from exponential order to polynomial order

書誌情報

情報処理学会論文誌
en : Transactions of Information Processing Society of Japan

巻 39, 号 4, p. 1116-1126, 発行日 1998-04-15

ISSN

収録物識別子タイプ

PISSN

収録物識別子

1882-7764

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

論文ID（NAID）

識別子タイプ

NAID

Versions

Ver.1

2021-03-01 06:11:31.027681

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

学習期間と制御期間に分割された強化学習問題における最適アルゴリズムの提案

× 前田, 康成

× 浮田, 善文

× 松嶋, 敏泰

× 平澤, 茂一

× MAEDA, Yasunari

× UKITA, Yoshihumi

× MATSUSHIMA, Toshiyasu

× HIRASAWA, Shigeichi

Versions

Share

Cite as

エクスポート