Cliff Walking#

_static/videos/toy_text/cliff_walking.gif

This environment is part of the Toy Text environments. Please read that page first for general information.

Action Space

Discrete(4)

Observation Space

Discrete(48)

Import

gym.make("CliffWalking-v0")

This is a simple implementation of the Gridworld Cliff reinforcement learning task.

Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction by Sutton and Barto.

With inspiration from: https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py

Description#

The board is a 4x12 matrix, with (using NumPy matrix indexing):

  • [3, 0] as the start at bottom-left

  • [3, 11] as the goal at bottom-right

  • [3, 1..10] as the cliff at bottom-center

If the agent steps on the cliff it returns to the start. An episode terminates when the agent reaches the goal.

Actions#

There are 4 discrete deterministic actions:

  • 0: move up

  • 1: move right

  • 2: move down

  • 3: move left

Observations#

There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results the end of episode). They remain all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.

Reward#

Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.

Arguments#

gym.make('CliffWalking-v0')

Version History#

  • v0: Initial version release