Application of reinforcement learning to a mobile robot in reaching recharging station operation

Abstract: Efficient control strategies for robot systems cannot always be developed by hand, especially when the robot system is operating in an unknown or uncertain environment. In this paper we show how Reinforcement Learning (RL) might be applied to improve the efficiency of a mobile robot in nuclear decommissioning characterisation, in particular allowing it to learn efficient routes back to a recharging station. We implement this learning functionality in a mobile agent (MA) environment. By doing so we can make use of the positive characteristics of MA mobility such as adaptability, fault tolerance and dynamic positioning of learning or control in a distributed system to supplement learning. Experimental results show how RL provides a more efficient method in this task than a non-AI control approach.


soroka's picture
Submitted by soroka on Mon, 04/07/2005 - 9:28am.

A very interesting paper and nice to see the comparison against non-AI methods.  I am wondering have you had the opportunity to compare your reinsforcement learning based method against other AI-technique based methods, for example neuro-fuzzy techniques have been applied to AGV navigation.


gbecci's picture
Submitted by gbecci on Mon, 04/07/2005 - 11:19am.

Related to AI technologies applied, in this paper is mentioned that the AIA are based on a learning policy and my questions are:

1)      Have been applied in combination other methodology such as Case Based reasoning, Simulated Annealing or even ANN and GA’s, apart from this “learning policy”?

2)      What is the capability of Grasshopper2 Agent Platform? (Can you give me a website?), could be used others AI-technology in programming AIA with Grasshopper2?

3)      In the paper is mentioned the AIA as distributed software, have been used a hierarchy or a protocol of AIA to “assembly” the control system?


liamcragg's picture
Submitted by liamcragg on Tue, 05/07/2005 - 10:35am.

Thank you for your comments. Re: your question. Another AI method that we have compared to RL is A* algorithm search. We found the results for this comparison were very similar to the RL approach. It would be beneficial to test some more AI approaches against RL and this would make a good next stage in our work. Although this would be appropriate, we found the performance of RL to be good. AI methods in general are a great improvement over simple reactive approaches, we've aimed to try and highlight this in the paper.


liamcragg's picture
Submitted by liamcragg on Tue, 05/07/2005 - 11:04am.

Thank you for your questions. Re: Question 1, at the moment, I have not combined the RL with other methodologies. In its current form the RL uses simple discrete action/state space which is adequate for the task/environment shown, but as your question states RL can be combined with other methodologies such as ANN's etc to deal with continuous action/state spaces. It would be appropriate maybe for me to consider and examine these areas in the next area of work to develop more complex behaviour (as robots in general have multiple sensors/actuators with continuous data/value ranges). Re: Question 2, the first thing I should mention about Grasshopper 2 is that it is no longer available free (that I know of). I began my wider research about three years ago, and during most of that time Grasshopper 2 was available free from a dedicated web-site. Unfortunately from the end of 2004 (IKV++ technologies AG) the company who made Grasshopper are no longer supporting this web-site. It may be worth asking them (http://www.ikv.de/) if it is still available through them, or if they have any newer products which they have used to replace it. Grasshopper is a java based mobile agent based agent platform, with a wide range of in-built functionality for agent manipulation (creation, destruction, migration, communication etc). There are alternative mobile agent platforms available, one such platform which might be worth examining is Aglets (http://www.trl.ibm.com/aglets/) which was originally developed by IBM, now opensource. Re: Question 3, the AIA itself is a single agent (Artificial Intelligence Agent). This sits on top of the Grasshopper Platform. The agent encapsulates all of the learning functionality discussed in the paper, and provides the GUI's to control the learning and control process. Because the agent is mobile if we have multiple robots/PC's it can be moved or copied within this network of PC's e.g. if we learn a policy with the agent it can be distributed automatically to all robots, using the mobility or copying functionality of agents provided by the underlying mobile agent platform. In the examples presented we have however focused on the learning task rather than the mobility characteristics of the agent.


gonzalo's picture
Submitted by gonzalo on Mon, 11/07/2005 - 10:36am.

Interesting paper to read.

1.- It would interesting to compare the RL approach using the Q-learning algorithm to other AI techniques which have been used in similar applications. Specifically Genetic Programming (John Koza) and Particle Swarm Optimisation.

2.- Using the RL approach presented here, for future work, it would be interesting to see the performance of this application considering more than one robot (cooperative robots) trying to find the recharging station.

Than you

Gonzalo.


Grabot's picture
Submitted by Grabot on Mon, 11/07/2005 - 3:12pm.

I have read your paper with interest even if I am not an expert on the field. I only have one concern: in my opinion, it is not really significant to compare AI and non-AI methods on the rewards obtained, since as you mention, the non-AI method does not use this concept. Therefore, it is clear tha AI methods are privileged.

Similarly, your non-AI method seems to be local (obstacle avoidance), therefore, your AI method which has a more global approach should be (and is) better for reaching a recharching station. What about non AI methods based on maps ?


liamcragg's picture
Submitted by liamcragg on Tue, 12/07/2005 - 1:19pm.

Thank you for your comments. Re: comment 1: yes, it would be interesting to compare the RL approach using Q-learning against Genetic Programming and Particle Swarm Optimisation. Thank you for the reference to John Koza, I will investigate that, thanks. Re: comment 2, considering co-operative robots attempting to reach the recharging station would be a good idea and worth investigating.

Thanks.

Liam


liamcragg's picture
Submitted by liamcragg on Tue, 12/07/2005 - 1:54pm.

Thank you for your comments. Re: comment 1, a method needed to be found to compare the AI method and the non-AI method. A good method could be considered to include a calculation of the distance travelled in getting to the recharger, relative to the number of times the recharger was reached. This would show how successful the method would be in reaching the recharger and the average length of path required.  As the RL method employs -ve reward for each step of the path taken by the robot in reaching the recharging station and a +ve reward for the robot reaching the recharging station these could be recorded to record the average length of the path required, in terms of -ve reward, and number of times the robot successfully reached the recharging station in terms of +ve reward. We wanted to compare the RL to a non-AI approach. The easiest way to do this was to record the +ve and -ve reward which would be obtained by the non-AI approach if it were the RL approach. i.e. for an equivalent move by the non-AI approach, a -ve reward would be recorded, even though it had no significance in the calculation of the non-AI move. This allowed us to do a direct comparison between the two methods in which there was no privilage to either method. Re: comment 2, an example of a non-AI method based on maps might include the potential field method. While this could be employed and may provide better results than the local approach employed in the paper, it would still be subject to problems in local minima and narrow environment areas through which it might not be able to pass. In contrast an AI approach especially RL can learn from experience that a local minima can be manouvered around, while a non-AI approach will not adapt its behaviour.

Thanks.

Liam


mariasuarez's picture
Submitted by mariasuarez on Thu, 14/07/2005 - 12:09pm.

Great paper.

I do not work in this field but it was interesting to read your paper because you gave a very good description of the topic and the applied methods. Also your answers to the questions gave additional useful information.

Thank you


liamcragg's picture
Submitted by liamcragg on Thu, 14/07/2005 - 5:50pm.

Thanks.

Liam


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Who's online

There are currently 0 users and 150 guests online.
Validate XHTML, CSS or WCAG