Cliff Walking#

This environment is part of the Toy Text environments. Please read that page first for general information.


Action Space	Discrete(4)
Observation Space	Discrete(48)
Import	`gym.make("CliffWalking-v0")`

This is a simple implementation of the Gridworld Cliff reinforcement learning task.

Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction by Sutton and Barto.

With inspiration from: https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py

Description#

The board is a 4x12 matrix, with (using NumPy matrix indexing):

[3, 0] as the start at bottom-left
[3, 11] as the goal at bottom-right
[3, 1..10] as the cliff at bottom-center

If the agent steps on the cliff it returns to the start. An episode terminates when the agent reaches the goal.

Actions#

There are 4 discrete deterministic actions:

0: move up
1: move right
2: move down
3: move left

Observations#

There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results the end of episode). They remain all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.

Reward#

Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.

Arguments#

gym.make('CliffWalking-v0')

Version History#

v0: Initial version release