報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究

木島, 康隆; KISHIMA, Yasutaka; キシマ, ヤスタカ

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "e5745a14-7516-4955-a40a-d6c2153bf317"}, "_deposit": {"created_by": 18, "id": "5110", "owners": [18], "pid": {"revision_id": 0, "type": "depid", "value": "5110"}, "status": "published"}, "_oai": {"id": "oai:muroran-it.repo.nii.ac.jp:00005110", "sets": ["227"]}, "author_link": ["22717"], "item_81_date_granted_17": {"attribute_name": "学位授与年月日", "attribute_value_mlt": [{"subitem_dategranted": "2013-09-26"}]}, "item_81_degree_grantor_10": {"attribute_name": "学位授与機関", "attribute_value_mlt": [{"subitem_degreegrantor": [{"subitem_degreegrantor_language": "ja", "subitem_degreegrantor_name": "室蘭工業大学"}, {"subitem_degreegrantor_language": "en", "subitem_degreegrantor_name": "Muroran Institute of Technology"}], "subitem_degreegrantor_identifier": [{"subitem_degreegrantor_identifier_name": "10103", "subitem_degreegrantor_identifier_scheme": "kakenhi"}]}]}, "item_81_degree_name_11": {"attribute_name": "学位名", "attribute_value_mlt": [{"subitem_degreename": "博士（工学）", "subitem_degreename_language": "ja"}]}, "item_81_description_25": {"attribute_name": "フォーマット", "attribute_value_mlt": [{"subitem_description": "application/pdf", "subitem_description_type": "Other"}]}, "item_81_description_7": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "本論文では，強化学習における学習の効率化に関して，ロボット外部の情報とロボットの内部の情報の２つの情報から考察する。強化学習では，Q空間と呼ばれる状態軸，行動軸，Q値軸からなる学習空間を基に学習を行う。状態軸はロボットが観測した周囲の環境の状態を示す。行動軸はロボットがとることの出来る行動を示す。Q値軸はある状態である行動をとった時に得られる期待報酬値を示す。Q空間は報酬を基に更新される。強化学習の問題点として，学習に時間がかかるという問題が挙げられる。特に，ロボットに搭載されるセンサが増加し，環境状態の情報量が増えると，それに伴い状態軸も増大し学習空間が大きくなる。学習空間が大きくなるとそれだけ多くの経験を必要とする。その結果，学習に多くの時間を要する。この問題に対して，本研究ではロボットの外部と内部の情報からQ値を改変し学習を効率化させる手法を提案する。ロボットの外部の情報とは，他のロボットとのコミュニケーションによって得る他のロボットの経験情報（Q値）である。実社会では，時間的な制約によりロボットが獲得可能な情報には限りがある。そのため，他のロボットとのコミュニケーションにより自身が得たQ値に加え他者からのQ値により，学習をより効率的にすることを考える。しかし，ただコミュニケーションを行うだけでは，自身の学習を阻害するような情報を得てしまい，却って学習の効率を下げる恐れがある。コミュニケーションを行う相手に関して選別し，自身にとって有益な情報をもたらす他者とコミュニケーションすべきである。そこで，本研究では，自身とって有益な情報を持つ他者を基に学習しコミュニケーションすることで，効率的に学習を行う手法を提案する。次に，ロボットの内部の情報の取り扱いとして，Q空間そのものをタスクに適した形に改変する。タスクを遂行するにあたり，センサ情報全てが必要であるとは限らない。タスクによって，重要となるセンサ情報と不要なセンサ情報が存在する。ロボットは環境とインタラクションしつつタスク遂行に重要なセンサをセンサ値と報酬の相関から統計的に判断する。そして，重要なセンサを用いてQ空間を再構築することで，従来よりもQ空間を縮小することができる。これにより，学習データが削減され学習に要する時間が短縮する。以上のことを実現する手法を提案する。これらロボットの外部と内部の情報の取捨選択によって，ロボットが利用する余分な情報を削減することができる。それにより効率的に学習が実現できることを示す。", "subitem_description_language": "ja", "subitem_description_type": "Abstract"}, {"subitem_description": " ", "subitem_description_type": "Abstract"}, {"subitem_description": "At present, reinforcement learning is the most prominent learning method used when controlling an actual robot. A robot receives environmental information from its sensors as inputs and as outputs performs suitable actions. A robot needs to learn the relation between each input and output. A robot learns proper actions based on a learning space. The learning space consists of an input axis, an output axis, and an evaluation axis. When the number of sensors increases, the learning space expands and as a result, the time taken by a robot to learn a task increases. The objective of this paper is to overcome this problem. If we reduce the learning space, the learning performance will also reduce. Therefore, I focus on reducing the learning time while keeping the learning space large. To achieve this, I follow two approaches. The first approach involves communicating with other robots, and gathering data for learning. Typically, a robot uses only the data it collects for learning. If the learning space is large, the time required by a robot to collect sufficient data increases. By using data collected from other robots, I attempt to accelerate the speed of learning. In Chapter 3, I examined an assumption with regard to the negative impact certain collected information could have on the robot. To this end, in Chapter 4, I propose a system in which a robot, when performing a task, selects only those robots that have profitable information. The second approach involves compressing the learning space by only considering sensors necessary to perform a task. Based on the task, some sensors are important and some are unimportant. By dynamic compression as per the task, I attempt to effectively accelerate the speed of learning. In Chapters 5 and 6, I propose a method by which a robot statistically identifies important sensors through interaction with the environment.In each chapter, I apply the proposed methods to the path planning problem. Two kinds of environment are used, maze and open space field. Experiments are performed using a computer simulation and an actual robot. In each case, I compare the proposed method with reinforcement learning and show the improvement of the learning speed with the high performance.", "subitem_description_language": "en", "subitem_description_type": "Abstract"}]}, "item_81_dissertation_number_13": {"attribute_name": "学位授与番号", "attribute_value_mlt": [{"subitem_dissertationnumber": "甲第346号"}]}, "item_81_identifier_registration": {"attribute_name": "ID登録", "attribute_value_mlt": [{"subitem_identifier_reg_text": "10.15118/00005101", "subitem_identifier_reg_type": "JaLC"}]}, "item_81_subject_9": {"attribute_name": "日本十進分類法", "attribute_value_mlt": [{"subitem_subject": "548.3", "subitem_subject_scheme": "NDC"}]}, "item_81_text_12": {"attribute_name": "学位の種別", "attribute_value_mlt": [{"subitem_text_language": "ja", "subitem_text_value": "課程博士"}]}, "item_81_text_14": {"attribute_name": "報告番号", "attribute_value_mlt": [{"subitem_text_language": "ja", "subitem_text_value": "甲第346号"}]}, "item_81_text_15": {"attribute_name": "学位記番号", "attribute_value_mlt": [{"subitem_text_language": "ja", "subitem_text_value": "博甲第346号"}]}, "item_81_version_type_24": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_access_right": {"attribute_name": "アクセス権", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorAffiliations": [{"affiliationNameIdentifiers": [], "affiliationNames": [{"affiliationName": "", "affiliationNameLang": "ja"}]}], "creatorNames": [{"creatorName": "木島, 康隆", "creatorNameLang": "ja"}, {"creatorName": "KISHIMA, Yasutaka", "creatorNameLang": "en"}, {"creatorName": "キシマ, ヤスタカ", "creatorNameLang": "ja-Kana"}], "familyNames": [{"familyName": "木島", "familyNameLang": "ja"}, {"familyName": "KISHIMA", "familyNameLang": "en"}, {"familyName": "キシマ", "familyNameLang": "ja-Kana"}], "givenNames": [{"givenName": "康隆", "givenNameLang": "ja"}, {"givenName": "Yasutaka", "givenNameLang": "en"}, {"givenName": "ヤスタカ", "givenNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "22717", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2016-02-15"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "A346.pdf", "filesize": [{"value": "14.3 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 14300000.0, "url": {"label": "A346", "objectType": "fulltext", "url": "https://muroran-it.repo.nii.ac.jp/record/5110/files/A346.pdf"}, "version_id": "6158621a-0522-4f4b-8eb0-89080d4bc5d4"}, {"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2016-02-15"}], "displaytype": "detail", "download_preview_message": "", "file_order": 1, "filename": "A346_summary.pdf", "filesize": [{"value": "405.8 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 405800.0, "url": {"label": "A346_summary", "objectType": "abstract", "url": "https://muroran-it.repo.nii.ac.jp/record/5110/files/A346_summary.pdf"}, "version_id": "b45c04f9-ccac-40e6-9a32-93a5ff639c86"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "jpn"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "doctoral thesis", "resourceuri": "http://purl.org/coar/resource_type/c_db06"}]}, "item_title": "報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究", "subitem_title_language": "ja"}]}, "item_type_id": "81", "owner": "18", "path": ["227"], "permalink_uri": "https://doi.org/10.15118/00005101", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2013-11-15"}, "publish_date": "2013-11-15", "publish_status": "0", "recid": "5110", "relation": {}, "relation_version_is_last": true, "title": ["報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究"], "weko_shared_id": -1}

報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究

https://doi.org/10.15118/00005101

名前 / ファイル	ライセンス	アクション
A346 (14.3 MB)
A346_summary (405.8 kB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2013-11-15

タイトル

言語

タイトル

報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_db06

資源タイプ

doctoral thesis

ID登録

10.15118/00005101

ID登録タイプ

JaLC

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

著者

木島, 康隆

WEKO 22717

ja	木島, 康隆
en	KISHIMA, Yasutaka
ja-Kana	キシマ, ヤスタカ

Search repository

抄録

内容記述タイプ

Abstract

内容記述

本論文では，強化学習における学習の効率化に関して，ロボット外部の情報とロボットの内部の情報の２つの情報から考察する。強化学習では，Q空間と呼ばれる状態軸，行動軸，Q値軸からなる学習空間を基に学習を行う。状態軸はロボットが観測した周囲の環境の状態を示す。行動軸はロボットがとることの出来る行動を示す。Q値軸はある状態である行動をとった時に得られる期待報酬値を示す。Q空間は報酬を基に更新される。強化学習の問題点として，学習に時間がかかるという問題が挙げられる。特に，ロボットに搭載されるセンサが増加し，環境状態の情報量が増えると，それに伴い状態軸も増大し学習空間が大きくなる。学習空間が大きくなるとそれだけ多くの経験を必要とする。その結果，学習に多くの時間を要する。この問題に対して，本研究ではロボットの外部と内部の情報からQ値を改変し学習を効率化させる手法を提案する。ロボットの外部の情報とは，他のロボットとのコミュニケーションによって得る他のロボットの経験情報（Q値）である。実社会では，時間的な制約によりロボットが獲得可能な情報には限りがある。そのため，他のロボットとのコミュニケーションにより自身が得たQ値に加え他者からのQ値により，学習をより効率的にすることを考える。しかし，ただコミュニケーションを行うだけでは，自身の学習を阻害するような情報を得てしまい，却って学習の効率を下げる恐れがある。コミュニケーションを行う相手に関して選別し，自身にとって有益な情報をもたらす他者とコミュニケーションすべきである。そこで，本研究では，自身とって有益な情報を持つ他者を基に学習しコミュニケーションすることで，効率的に学習を行う手法を提案する。次に，ロボットの内部の情報の取り扱いとして，Q空間そのものをタスクに適した形に改変する。タスクを遂行するにあたり，センサ情報全てが必要であるとは限らない。タスクによって，重要となるセンサ情報と不要なセンサ情報が存在する。ロボットは環境とインタラクションしつつタスク遂行に重要なセンサをセンサ値と報酬の相関から統計的に判断する。そして，重要なセンサを用いてQ空間を再構築することで，従来よりもQ空間を縮小することができる。これにより，学習データが削減され学習に要する時間が短縮する。以上のことを実現する手法を提案する。これらロボットの外部と内部の情報の取捨選択によって，ロボットが利用する余分な情報を削減することができる。それにより効率的に学習が実現できることを示す。

言語

抄録

内容記述タイプ

Abstract

抄録

内容記述タイプ

Abstract

内容記述

At present, reinforcement learning is the most prominent learning method used when controlling an actual robot. A robot receives environmental information from its sensors as inputs and as outputs performs suitable actions. A robot needs to learn the relation between each input and output. A robot learns proper actions based on a learning space. The learning space consists of an input axis, an output axis, and an evaluation axis. When the number of sensors increases, the learning space expands and as a result, the time taken by a robot to learn a task increases. The objective of this paper is to overcome this problem. If we reduce the learning space, the learning performance will also reduce. Therefore, I focus on reducing the learning time while keeping the learning space large. To achieve this, I follow two approaches. The first approach involves communicating with other robots, and gathering data for learning. Typically, a robot uses only the data it collects for learning. If the learning space is large, the time required by a robot to collect sufficient data increases. By using data collected from other robots, I attempt to accelerate the speed of learning. In Chapter 3, I examined an assumption with regard to the negative impact certain collected information could have on the robot. To this end, in Chapter 4, I propose a system in which a robot, when performing a task, selects only those robots that have profitable information. The second approach involves compressing the learning space by only considering sensors necessary to perform a task. Based on the task, some sensors are important and some are unimportant. By dynamic compression as per the task, I attempt to effectively accelerate the speed of learning. In Chapters 5 and 6, I propose a method by which a robot statistically identifies important sensors through interaction with the environment.In each chapter, I apply the proposed methods to the path planning problem. Two kinds of environment are used, maze and open space field. Experiments are performed using a computer simulation and an actual robot. In each case, I compare the proposed method with reinforcement learning and show the improvement of the learning speed with the high performance.

言語

学位授与機関

学位授与機関識別子Scheme

kakenhi

学位授与機関識別子

10103

言語

学位授与機関名

室蘭工業大学

言語

学位授与機関名

Muroran Institute of Technology

学位名

言語

学位名

博士（工学）

学位の種別

課程博士

学位授与番号

甲第346号

報告番号

甲第346号

学位記番号

博甲第346号

学位授与年月日

2013-09-26

日本十進分類法

主題Scheme

NDC

主題

548.3

著者版フラグ

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

フォーマット

内容記述タイプ

Other

内容記述

application/pdf

戻る

views

See details

	Views

Versions

Ver.1

2023-06-19 11:17:46.607920

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

報酬に基づいた環境情報の取捨選択による行動学習の効率化に関する研究

× 木島, 康隆

Versions

Share

Cite as

エクスポート