<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-03-09T06:14:09Z</responseDate>
  <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:muroran-it.repo.nii.ac.jp:02000059">https://muroran-it.repo.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:muroran-it.repo.nii.ac.jp:02000059</identifier>
        <datestamp>2023-10-05T01:01:26Z</datestamp>
        <setSpec>216:325</setSpec>
        <setSpec>46</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Self-generation of reward by logarithmic transformation of multiple sensor evaluations</dc:title>
          <dc:creator>Ono, Yuya</dc:creator>
          <dc:creator>小野, 裕也</dc:creator>
          <dc:creator>Kurashige, Kentarou</dc:creator>
          <dc:creator>倉重, 健太郎</dc:creator>
          <dc:creator>Hakim Afiqe Anuar Bin Muhammad Nor</dc:creator>
          <dc:creator>Sakamoto, Yuma</dc:creator>
          <dc:creator>坂本, 悠真</dc:creator>
          <dc:subject>Self-Generation of Reward</dc:subject>
          <dc:subject>Reinforcement learning</dc:subject>
          <dc:subject>Danger recognition</dc:subject>
          <dc:description>Although the design of the reward function in reinforcement learning is important, it is difficult to design a system that can adapt to a variety of environments and tasks. Therefore, we propose a method to autonomously generate rewards from sensor values, enabling task- and environment-independent reward design. Under this approach, environmental hazards are recognized by evaluating sensor values. The evaluation used for learning is obtained by integrating all the sensor evaluations that indicate danger. Although prior studies have employed weighted averages to integrate sensor evaluations, this approach does not reflect the increased danger arising from a higher amount of more sensor evaluations indicating danger. Instead, we propose the integration of sensor evaluation using logarithmic transformation. Through a path learning experiment, the proposed method was evaluated by comparing its rewards to those gained from manual reward setting and prior approaches.</dc:description>
          <dc:description>journal article</dc:description>
          <dc:publisher>Springer Nature</dc:publisher>
          <dc:date>2023</dc:date>
          <dc:type>AM</dc:type>
          <dc:format>application/pdf</dc:format>
          <dc:identifier>Artificial Life and Robotics</dc:identifier>
          <dc:identifier>2</dc:identifier>
          <dc:identifier>28</dc:identifier>
          <dc:identifier>287</dc:identifier>
          <dc:identifier>294</dc:identifier>
          <dc:identifier>1433-5298</dc:identifier>
          <dc:identifier>https://muroran-it.repo.nii.ac.jp/record/2000059/files/camera_ready.pdf</dc:identifier>
          <dc:identifier>http://hdl.handle.net/10258/0002000059</dc:identifier>
          <dc:identifier>https://muroran-it.repo.nii.ac.jp/records/2000059</dc:identifier>
          <dc:language>eng</dc:language>
          <dc:relation>10.1007/s10015-023-00855-1</dc:relation>
          <dc:rights>© International Society of Artifcial Life and Robotics (ISAROB) 2023</dc:rights>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
