Reward Wrappers#

Base Class#

class gymnasium.RewardWrapper(env: Env[ObsType, ActType])#

reward() to implement that transformation. This transformation might change the reward_range; to specify the reward_range of your wrapper, you can simply define self.reward_range in __init__().

Constructor for the Reward wrapper.

reward(reward: SupportsFloat) SupportsFloat#

Returns a modified environment reward.

Parameters:

reward – The env step() reward

Returns:

The modified `reward`

Available Reward Wrappers#

class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#

Transform the reward via an arbitrary function.

Warning

If the base environment specifies a reward range which is not invariant under f, the reward_range of the wrapped environment will be incorrect.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import TransformReward
>>> env = gym.make("CartPole-v1")
>>> env = TransformReward(env, lambda r: 0.01*r)
>>> _ = env.reset()
>>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
>>> reward
0.01
Parameters:
  • env – The environment to apply the wrapper

  • f – A function that transforms the reward

class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

Parameters:
  • env (env) – The environment to apply the wrapper

  • epsilon (float) – A stability parameter

  • gamma (float) – The discount factor that is used in the exponential moving average.