Agent → action (at ) →→ Environment

  ↑                                           ↓      ↓

 ↑←← reward (rt ) ←← rt+1     ↓

↑←← state (st )←←←←← st+1