Reward Wrappers#
Reward Wrapper#
- class gymnasium.RewardWrapper(env: Env)#
Superclass of wrappers that can modify the returning reward from a step.
If you would like to apply a function to the reward that is returned by the base environment before passing it to learning code, you can simply inherit from
RewardWrapper
and overwrite the methodreward()
to implement that transformation. This transformation might change thereward_range
; to specify thereward_range
of your wrapper, you can simply defineself.reward_range
in__init__()
.Let us look at an example: Sometimes (especially when we do not have control over the reward because it is intrinsic), we want to clip the reward to a range to gain some numerical stability. To do that, we could, for instance, implement the following wrapper:
class ClipReward(gymnasium.RewardWrapper): def __init__(self, env, min_reward, max_reward): super().__init__(env) self.min_reward = min_reward self.max_reward = max_reward self.reward_range = (min_reward, max_reward) def reward(self, reward): return np.clip(reward, self.min_reward, self.max_reward)
Wraps an environment to allow a modular transformation of the
step()
andreset()
methods.- Parameters:
env – The environment to wrap
- reward(self, reward)#
Returns a modified environment
reward
.- Parameters:
reward – The
env
step()
reward- Returns:
The modified `reward`
Transform Reward#
- class gymnasium.wrappers.TransformReward(env: Env, f: Callable[[float], float])#
Transform the reward via an arbitrary function.
Warning
If the base environment specifies a reward range which is not invariant under
f
, thereward_range
of the wrapped environment will be incorrect.Example
>>> import gymnasium as gym >>> env = gym.make('CartPole-v1') >>> env = TransformReward(env, lambda r: 0.01*r) >>> env.reset() >>> observation, reward, terminated, truncated, info = env.step(env.action_space.sample()) >>> reward 0.01
Initialize the
TransformReward
wrapper with an environment and reward transform functionf
.- Parameters:
env – The environment to apply the wrapper
f – A function that transforms the reward
Normalize Reward#
- class gymnasium.wrappers.NormalizeReward(env: Env, gamma: float = 0.99, epsilon: float = 1e-08)#
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
- Parameters:
env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.