- class gymnasium.RewardWrapper(env: Env[ObsType, ActType])#
Superclass of wrappers that can modify the returning reward from a step.
If you would like to apply a function to the reward that is returned by the base environment before passing it to learning code, you can simply inherit from
RewardWrapperand overwrite the method
reward()to implement that transformation. This transformation might change the
reward_range; to specify the
reward_rangeof your wrapper, you can simply define
Constructor for the Reward wrapper.
- reward(reward: SupportsFloat) SupportsFloat #
Returns a modified environment
reward – The
The modified `reward`
Available Reward Wrappers#
- class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#
Transform the reward via an arbitrary function.
If the base environment specifies a reward range which is not invariant under
reward_rangeof the wrapped environment will be incorrect.
>>> import gymnasium as gym >>> from gymnasium.wrappers import TransformReward >>> env = gym.make("CartPole-v1") >>> env = TransformReward(env, lambda r: 0.01*r) >>> _ = env.reset() >>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample()) >>> reward 0.01
env – The environment to apply the wrapper
f – A function that transforms the reward
- class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.