Reward Wrappers#

Base Class#

class gymnasium.RewardWrapper(env: Env[ObsType, ActType])#

Superclass of wrappers that can modify the returning reward from a step.

If you would like to apply a function to the reward that is returned by the base environment before passing it to learning code, you can simply inherit from RewardWrapper and overwrite the method reward() to implement that transformation. This transformation might change the reward_range; to specify the reward_range of your wrapper, you can simply define self.reward_range in __init__().

Constructor for the Reward wrapper.

reward(reward: SupportsFloat) SupportsFloat#

Returns a modified environment reward.

Parameters:

reward – The env step() reward

Returns:

The modified `reward`

Available Reward Wrappers#

class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#

Transform the reward via an arbitrary function.

Warning

If the base environment specifies a reward range which is not invariant under f, the reward_range of the wrapped environment will be incorrect.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import TransformReward
>>> env = gym.make("CartPole-v1")
>>> env = TransformReward(env, lambda r: 0.01*r)
>>> _ = env.reset()
>>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
>>> reward
0.01

Initialize the TransformReward wrapper with an environment and reward transform function f.

Parameters:
  • env – The environment to apply the wrapper

  • f – A function that transforms the reward

class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

Parameters:
  • env (env) – The environment to apply the wrapper

  • epsilon (float) – A stability parameter

  • gamma (float) – The discount factor that is used in the exponential moving average.