Background 1/2 - RL Agent/Environment Interaction

Agent → action (at ) →→ Environment

  ↑                                           ↓      ↓

 ↑←← reward (rt ) ←← rt+1     ↓

↑←← state (st )←←←←← st+1

RL is a control strategy in which an agent embedded in an environment attempts to maximise total reward (or return) in pursuit of a goal. The agent at any time step is in a particular state relative to the environment and can take one of a number of actions within that state to reach its goal. When the agent performs an action it receives feedback in the form of a reward from the environment which indicates if this is a good or bad action to take in attempting to achieve the goal. The value of taking an action or being in any state can be defined using value functions e.g. the Action-Value Function (or Q-Value) Qπ(s,a), the expected return when starting from state s, taking action a, and then following policy pi (π).


Who's online

There are currently 0 users and 138 guests online.