Reward Wrappers#

Reward Wrapper#

class gymnasium.RewardWrapper(env: Env)#

Superclass of wrappers that can modify the returning reward from a step.

If you would like to apply a function to the reward that is returned by the base environment before passing it to learning code, you can simply inherit from RewardWrapper and overwrite the method reward() to implement that transformation. This transformation might change the reward_range; to specify the reward_range of your wrapper, you can simply define self.reward_range in __init__().

Let us look at an example: Sometimes (especially when we do not have control over the reward because it is intrinsic), we want to clip the reward to a range to gain some numerical stability. To do that, we could, for instance, implement the following wrapper:

class ClipReward(gymnasium.RewardWrapper):
    def __init__(self, env, min_reward, max_reward):
        super().__init__(env)
        self.min_reward = min_reward
        self.max_reward = max_reward
        self.reward_range = (min_reward, max_reward)

    def reward(self, reward):
        return np.clip(reward, self.min_reward, self.max_reward)

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reward(self, reward)#

Returns a modified environment reward.

Parameters:: reward – The env step() reward
Returns:: The modified `reward`

Transform Reward#

class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#

Transform the reward via an arbitrary function.

Warning

If the base environment specifies a reward range which is not invariant under f, the reward_range of the wrapped environment will be incorrect.

Example

>>> import gymnasium as gym
>>> env = gym.make('CartPole-v1')
>>> env = TransformReward(env, lambda r: 0.01*r)
>>> env.reset()
>>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
>>> reward
0.01

Initialize the TransformReward wrapper with an environment and reward transform function f.

Parameters:

env – The environment to apply the wrapper
f – A function that transforms the reward

Normalize Reward#

class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-08)#

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

Parameters:

env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.