Reward Wrappers#
Base Class#
- class gymnasium.RewardWrapper(env: Env[ObsType, ActType])#
Superclass of wrappers that can modify the returning reward from a step.
If you would like to apply a function to the reward that is returned by the base environment before passing it to learning code, you can simply inherit from
RewardWrapper
and overwrite the methodreward()
to implement that transformation. This transformation might change thereward_range
; to specify thereward_range
of your wrapper, you can simply defineself.reward_range
in__init__()
.Constructor for the Reward wrapper.
- reward(reward: SupportsFloat) SupportsFloat #
Returns a modified environment
reward
.- Parameters:
reward – The
env
step()
reward- Returns:
The modified `reward`
Available Reward Wrappers#
- class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#
Transform the reward via an arbitrary function.
Warning
If the base environment specifies a reward range which is not invariant under
f
, thereward_range
of the wrapped environment will be incorrect.Example
>>> import gymnasium as gym >>> env = gym.make('CartPole-v1') >>> env = TransformReward(env, lambda r: 0.01*r) >>> env.reset() >>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample()) >>> reward 0.01
Initialize the
TransformReward
wrapper with an environment and reward transform functionf
.- Parameters:
env – The environment to apply the wrapper
f – A function that transforms the reward
- class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
- Parameters:
env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.