LC1l
  • Introduction

  • Background

  • System Implementation

  • Experiments

  • Results and Analysis

  • Conclusions

  • Nuclear industry characterisation robots (i.e. radiological mapping)

  • Battery powered robots must recharge batteries

  • Robots must find efficient paths to the recharger

  • Use RL to find efficient paths

Agent → action (at ) →→ Environment

  ↑                                           ↓      ↓

 ↑←← reward (rt ) ←← rt+1     ↓

↑←← state (st )←←←←← st+1

initialise Qπ(s,a) arbitrarily

repeat (for each episode):

  initialise s

  repeat (for each step of episode):

     choose a from s using policy derived from Q

     take action a, observe r, s’

     Qπ(s,a) ← Qπ(s,a) + α[r + γ maxa’ Qπ(s,a) – Qπ(s,a)]

     s ← s’;

     until s is terminal

small7
small8
small9
small10
small11
small12
small13
small14
  • RL makes a robot's behaviour more adaptable (learn)

  • RL implemented in a MA environment = more adaptable, robust, dynamically reconfigurable architecture

  • Experimental results show RL can learn efficient control policies in a range of environments of varying complexity

  • Experimental results shown RL provides a more efficient + safer method for guiding a robot back to a recharging station than a simple non-AI method