Reward Wrappers#

Base Class#

class gymnasium.RewardWrapper(env: Env[ObsType, ActType])#

reward() to implement that transformation. This transformation might change the reward_range; to specify the reward_range of your wrapper, you can simply define self.reward_range in __init__().

Constructor for the Reward wrapper.

reward(reward: SupportsFloat) → SupportsFloat#

Returns a modified environment reward.

Parameters:: reward – The env step() reward
Returns:: The modified `reward`

Available Reward Wrappers#

class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#

Transform the reward via an arbitrary function.

Warning

If the base environment specifies a reward range which is not invariant under f, the reward_range of the wrapped environment will be incorrect.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import TransformReward
>>> env = gym.make("CartPole-v1")
>>> env = TransformReward(env, lambda r: 0.01*r)
>>> _ = env.reset()
>>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
>>> reward
0.01

Parameters:

env – The environment to apply the wrapper
f – A function that transforms the reward

class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

Parameters:

env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.