Wrappers#

gymnasium.Wrapper#

class gymnasium.Wrapper(env: Env)#

Wraps a gymnasium.Env to allow a modular transformation of the step() and reset() methods.

This class is the base class of all wrappers to change the behavior of the underlying environment allowing modification to the action_space, observation_space, reward_range and metadata that doesn’t change the underlying environment attributes.

In addition, for several attributes (spec, render_mode, np_random) will point back to the wrapper’s environment.

Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly. Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can also be chained to combine their effects. Most environments that are generated via gymnasium.make will already be wrapped by default.

In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along with (possibly optional) parameters to the wrapper’s constructor.

>>> import gymnasium as gym
>>> from gymnasium.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)

You can access the environment underneath the first wrapper by using the env attribute. As the Wrapper class inherits from Env then env can be another wrapper.

>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.env
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>

If you want to get to the environment underneath all of the layers of wrappers, you can use the .unwrapped attribute. If the environment is already a bare environment, the .unwrapped attribute will just return itself.

>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.unwrapped
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>

There are three common things you might want a wrapper to do:

Transform actions before applying them to the base environment
Transform observations that are returned by the base environment
Transform rewards that are returned by the base environment

Such wrappers can be easily implemented by inheriting from ActionWrapper, ObservationWrapper, or RewardWrapper and implementing the respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the Wrapper class directly. The code that is presented in the following sections can also be found in the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository

Note

Don’t forget to call super().__init__(env)

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

Methods#

gymnasium.Wrapper.step(self, action: ActType) → Tuple[ObsType, float, bool, bool, dict]#: Uses the step() of the env that can be overwritten to change the returned data.

gymnasium.Wrapper.reset(self, **kwargs) → Tuple[ObsType, dict]#: Uses the reset() of the env that can be overwritten to change the returned data.

gymnasium.Wrapper.close(self)#: Closes the wrapper and env.

Attributes#

property Wrapper.action_space: Space[ActType]#: Return the Env action_space unless overwritten then the wrapper action_space is used.

property Wrapper.observation_space: Space#: Return the Env observation_space unless overwritten then the wrapper observation_space is used.

property Wrapper.reward_range: Tuple[SupportsFloat, SupportsFloat]#: Return the Env reward_range unless overwritten then the wrapper reward_range is used.

property Wrapper.spec#: Returns the Env spec attribute.

property Wrapper.metadata: dict#: Returns the Env metadata.

property Wrapper.np_random: Generator#: Returns the Env np_random attribute.

property Wrapper.unwrapped: Env#: Returns the base environment of the wrapper.

Gymnasium Wrappers#

Gymnasium provides a number of commonly used wrappers listed below. More information can be found on the particular wrapper in the page on the wrapper type

Name	Type	Description
`AtariPreprocessing`	Misc Wrapper	Implements the common preprocessing applied tp Atari environments
`AutoResetWrapper`	Misc Wrapper	The wrapped environment will automatically reset when the terminated or truncated state is reached.
`ClipAction`	Action Wrapper	Clip the continuous action to the valid bound specified by the environment’s action_space
`EnvCompatibility`	Misc Wrapper	Provides compatibility for environments written in the OpenAI Gym v0.21 API to look like Gymnasium environments
`FilterObservation`	Observation Wrapper	Filters a dictionary observation spaces to only requested keys
`FlattenObservation`	Observation Wrapper	An Observation wrapper that flattens the observation
`FrameStack`	Observation Wrapper	AnObservation wrapper that stacks the observations in a rolling manner.
`GrayScaleObservation`	Observation Wrapper	Convert the image observation from RGB to gray scale.
`HumanRendering`	Misc Wrapper	Allows human like rendering for environments that support “rgb_array” rendering
`NormalizeObservation`	Observation Wrapper	This wrapper will normalize observations s.t. each coordinate is centered with unit variance.
`NormalizeReward`	Reward Wrapper	This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
`OrderEnforcing`	Misc Wrapper	This will produce an error if step or render is called before reset
`PixelObservationWrapper`	Observation Wrapper	Augment observations by pixel values obtained via render that can be added to or replaces the environments observation.
`RecordEpisodeStatistics`	Misc Wrapper	This will keep track of cumulative rewards and episode lengths returning them at the end.
`RecordVideo`	Misc Wrapper	This wrapper will record videos of rollouts.
`RenderCollection`	Misc Wrapper	Enable list versions of render modes, i.e. “rgb_array_list” for “rgb_array” such that the rendering for each step are saved in a list until render is called.
`RescaleAction`	Action Wrapper	Rescales the continuous action space of the environment to a range [min_action, max_action], where min_action and max_action are numpy arrays or floats.
`ResizeObservation`	Observation Wrapper	This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple shape.
`StepAPICompatibility`	Misc Wrapper	Modifies an environment step function from (old) done to the (new) termination / truncation API.
`TimeAwareObservation`	Observation Wrapper	Augment the observation with current time step in the trajectory (by appending it to the observation).
`TimeLimit`	Misc Wrapper	This wrapper will emit a truncated signal if the specified number of steps is exceeded in an episode.
`TransformObservation`	Observation Wrapper	This wrapper will apply function to observations
`TransformReward`	Reward Wrapper	This wrapper will apply function to rewards
`VectorListInfo`	Misc Wrapper	This wrapper will convert the info of a vectorized environment from the dict format to a list of dictionaries where the i-th dictionary contains info of the i-th environment.

Implementing a custom wrapper#

Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the reward based on data in info or change the rendering behavior). Such wrappers can be implemented by inheriting from Misc Wrapper.

You can set a new action or observation space by defining self.action_space or self.observation_space in __init__, respectively
You can set new metadata and reward range by defining self.metadata and self.reward_range in __init__, respectively
You can override step, render, close etc. If you do this, you can access the environment that was passed to your wrapper (which still might be wrapped in some other wrapper) by accessing the attribute self.env.

Let’s also take a look at an example for this case. Most MuJoCo environments return a reward that consists of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during initialization of the environment. However, Reacher does not allow you to do this! Nevertheless, all individual terms of the reward are returned in info, so let us build a wrapper for Reacher that allows us to weight those terms:

import gymnasium as gym

class ReacherRewardWrapper(gym.Wrapper):
    def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
        super().__init__(env)
        self.reward_dist_weight = reward_dist_weight
        self.reward_ctrl_weight = reward_ctrl_weight

    def step(self, action):
        obs, _, terminated, truncated, info = self.env.step(action)
        reward = (
            self.reward_dist_weight * info["reward_dist"]
            + self.reward_ctrl_weight * info["reward_ctrl"]
        )
        return obs, reward, terminated, truncated, info

Note

It is not sufficient to use a RewardWrapper in this case!