Results and Analysis 1/3 - RL Rate Experiment Results
RL rate experiment results are shown on the slide above, which includes results for all environments from 0-50000 episodes. It shows that using this RL setup and environments, a stable policy can be learnt by approximately the 40000 episode mark, and that the most rapid learning occurs in the first 1000 episodes as shown by a rise in positive reward as an overall percentage of reward from 0% to 49-50%. Learning continues to rise, but at a slower rate over the next 9000 episodes up to the 10000 episode mark from 49-50% to 79-82% for positive reward. Finally, over the final 40000 episodes up to the 50000 episode mark the learning rate continues at a slower rate and becomes stable at an 85-87% level. It can be seen that there is not a large difference between the learning rates or overall positive reward obtained in each of the five environments despite the difference in environment structure.









