Experiments 2/2 - Experiment Set-Up
Two sets of experiments were conducted using the system and environments described previously:
- RL rate experiments – to assess the rate at which a stable policy could be learnt using RL in a variety of robot environments.
- RL vs. non-AI control – to compare RL against a non-AI control method.
In order to direct the learning process in both sets of experiments, the following rewards were used: +100 if the robot reaches the battery recharging station, -1.0 if the robot makes a straight move (N, S, E, W), and -1.4 if the robot makes a diagonal move (NE, NW, SE, SW). These rewards encouraged the robot to move to the recharging station in the least number of valid moves to achieve a maximum reward. In the Q-learning algorithm α was set to 0.1 to encourage learning to occur slowly, γ was set to 0.9 so that future reward has more value than immediate reward, and ε in the ε-greedy algorithm was set to 0.001 to encourage exploration on average only once every 1000 moves, so that the robot utilises maximum reward actions more often. The experiment set-up is shown in the slide above. In this set-up a AIA is located on a PC on which it runs a number of simulations of the robot environments. By running a large number of simulations in which the robot starts in a variety of different locations, the AIA can learn the optimum path the robot should follow given any start point in the environment.









