Reward Wrappers#
Base Class#
- class gymnasium.RewardWrapper(env: Env[ObsType, ActType])#
reward()
to implement that transformation. This transformation might change thereward_range
; to specify thereward_range
of your wrapper, you can simply defineself.reward_range
in__init__()
.Constructor for the Reward wrapper.
- reward(reward: SupportsFloat) SupportsFloat #
Returns a modified environment
reward
.- Parameters:
reward – The
env
step()
reward- Returns:
The modified `reward`
Available Reward Wrappers#
- class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#
Transform the reward via an arbitrary function.
Warning
If the base environment specifies a reward range which is not invariant under
f
, thereward_range
of the wrapped environment will be incorrect.Example
>>> import gymnasium as gym >>> from gymnasium.wrappers import TransformReward >>> env = gym.make("CartPole-v1") >>> env = TransformReward(env, lambda r: 0.01*r) >>> _ = env.reset() >>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample()) >>> reward 0.01
- Parameters:
env – The environment to apply the wrapper
f – A function that transforms the reward
- class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
- Parameters:
env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.