Misc Wrappers#

class gymnasium.wrappers.AtariPreprocessing(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)#

Atari 2600 preprocessing wrapper.

This class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.

Specifically, the following preprocess stages applies to the atari environment:

Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.
Frame skipping: The number of frames skipped between steps, 4 by default
Max-pooling: Pools over the most recent two observations from the frame skips
Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.
Turned off by default. Not recommended by Machado et al. (2018).
Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default
Grayscale observation: If the observation is colour or greyscale, by default, greyscale.
Scale observation: If to scale the observation between [0, 1) or [0, 255), by default, not scaled.

Wrapper for Atari 2600 preprocessing.

Parameters:

env (Env) – The environment to apply the preprocessing
noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.
frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.
screen_size (int) – resize Atari frame
terminal_on_life_loss (bool) – if True, then step() returns terminated=True whenever a life is lost.
grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.
grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.
scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.

Raises:

DependencyNotInstalled – opencv-python package not installed
ValueError – Disable frame-skipping in the original env

class gymnasium.wrappers.AutoResetWrapper(env: Env)#

A class for providing an automatic reset functionality for gymnasium environments when calling self.step().

When calling step causes Env.step() to return terminated=True or truncated=True, Env.reset() is called, and the return format of self.step() is as follows: (new_obs, final_reward, final_terminated, final_truncated, info) with new step API and (new_obs, final_reward, final_done, info) with the old step API.

new_obs is the first observation after calling self.env.reset()

final_reward is the reward after calling self.env.step(), prior to calling self.env.reset().

final_terminated is the terminated value before calling self.env.reset().

final_truncated is the truncated value before calling self.env.reset(). Both final_terminated and final_truncated cannot be False.

info is a dict containing all the keys from the info dict returned by the call to self.env.reset(), with an additional key “final_observation” containing the observation returned by the last call to self.env.step() and “final_info” containing the info dict returned by the last call to self.env.step().

Warning: When using this wrapper to collect rollouts, note that when Env.step() returns terminated or truncated, a: new observation from after calling Env.reset() is returned by Env.step() alongside the final reward, terminated and truncated state from the previous episode. If you need the final state from the previous episode, you need to retrieve it via the “final_observation” key in the info dict. Make sure you know what you’re doing if you use this wrapper!

A class for providing an automatic reset functionality for gymnasium environments when calling self.step().

Parameters:: env (gym.Env) – The environment to apply the wrapper

class gymnasium.wrappers.EnvCompatibility(old_env: LegacyEnv, render_mode: Optional[str] = None)#

A wrapper which can transform an environment from the old API to the new API.

Old step API refers to step() method returning (observation, reward, done, info), and reset() only retuning the observation. New step API refers to step() method returning (observation, reward, terminated, truncated, info) and reset() returning (observation, info). (Refer to docs for details on the API change)

Known limitations: - Environments that use self.np_random might not work as expected.

A wrapper which converts old-style envs to valid modern envs.

Some information may be lost in the conversion, so we recommend updating your environment.

Parameters:

old_env (LegacyEnv) – the env to wrap, implemented with the old API
render_mode (str) – the render mode to use when rendering the environment, passed automatically to env.render

class gymnasium.wrappers.StepAPICompatibility(env: Env, output_truncation_bool: bool = True)#

A wrapper which can transform an environment from new step API to old and vice-versa.

Old step API refers to step() method returning (observation, reward, done, info) New step API refers to step() method returning (observation, reward, terminated, truncated, info) (Refer to docs for details on the API change)

Parameters:

env (gym.Env) – the env to wrap. Can be in old or new API
output_truncation_bool (bool) – Apply to convert environment to use new step API that returns two bool. (True by default)

Examples

>>> env = gym.make("CartPole-v1")
>>> env # wrapper not applied by default, set to new API
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
>>> env = gym.make("CartPole-v1", apply_api_compatibility=True) # set to old API
<StepAPICompatibility<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>>
>>> env = StepAPICompatibility(CustomEnv(), output_truncation_bool=False) # manually using wrapper on unregistered envs

A wrapper which can transform an environment from new step API to old and vice-versa.

Parameters:

env (gym.Env) – the env to wrap. Can be in old or new API
output_truncation_bool (bool) – Whether the wrapper’s step method outputs two booleans (new API) or one boolean (old API)

class gymnasium.wrappers.PassiveEnvChecker(env)#

A passive environment checker wrapper that surrounds the step, reset and render functions to check they follow the gymnasium API.

Initialises the wrapper with the environments, run the observation and action space tests.

class gymnasium.wrappers.HumanRendering(env)#

Performs human rendering for an environment that only supports “rgb_array”rendering.

This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify "render_fps" in the metadata of your environment.

The render_mode of the wrapped environment must be either 'rgb_array' or 'rgb_array_list'.

Example

>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> wrapped = HumanRendering(env)
>>> wrapped.reset()     # This will start rendering to the screen

The wrapper can also be applied directly when the environment is instantiated, simply by passing render_mode="human" to make. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e. render_mode does not contain "human").

Example

>>> env = gym.make("NoNativeRendering-v2", render_mode="human")      # NoNativeRendering-v0 doesn't implement human-rendering natively
>>> env.reset()     # This will start rendering to the screen

Warning: If the base environment uses render_mode="rgb_array_list", its (i.e. the base environment’s) render method

will always return an empty list:

>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list")
>>> wrapped = HumanRendering(env)
>>> wrapped.reset()
>>> env.render()
[]          # env.render() will always return an empty list!

Initialize a HumanRendering instance.

Parameters:: env – The environment that is being wrapped

class gymnasium.wrappers.OrderEnforcing(env: Env, disable_render_order_enforcing: bool = False)#

A wrapper that will produce an error if step() is called before an initial reset().

Example

>>> from gymnasium.envs.classic_control import CartPoleEnv
>>> env = CartPoleEnv()
>>> env = OrderEnforcing(env)
>>> env.step(0)
ResetNeeded: Cannot call env.step() before calling env.reset()
>>> env.render()
ResetNeeded: Cannot call env.render() before calling env.reset()
>>> env.reset()
>>> env.render()
>>> env.step(0)

A wrapper that will produce an error if step() is called before an initial reset().

Parameters:

env – The environment to wrap
disable_render_order_enforcing – If to disable render order enforcing

class gymnasium.wrappers.RecordEpisodeStatistics(env: Env, deque_size: int = 100)#

This wrapper will keep track of cumulative rewards and episode lengths.

At the end of an episode, the statistics of the episode will be added to info using the key episode. If using a vectorized environment also the key _episode is used which indicates whether the env at the respective index has the episode statistics.

After the completion of an episode, info will look like this:

>>> info = {
...     ...
...     "episode": {
...         "r": "<cumulative reward>",
...         "l": "<episode length>",
...         "t": "<elapsed time since beginning of episode>"
...     },
... }

For a vectorized environments the output will be in the form of:

>>> infos = {
...     ...
...     "episode": {
...         "r": "<array of cumulative reward>",
...         "l": "<array of episode length>",
...         "t": "<array of elapsed time since beginning of episode>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }

Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via wrapped_env.return_queue and wrapped_env.length_queue respectively.

Variables:

return_queue – The cumulative rewards of the last deque_size-many episodes
length_queue – The lengths of the last deque_size-many episodes

This wrapper will keep track of cumulative rewards and episode lengths.

Parameters:

env (Env) – The environment to apply the wrapper
deque_size – The size of the buffers return_queue and length_queue

class gymnasium.wrappers.RecordVideo(env: Env, video_folder: str, episode_trigger: Optional[Callable[[int], bool]] = None, step_trigger: Optional[Callable[[int], bool]] = None, video_length: int = 0, name_prefix: str = 'rl-video', disable_logger: bool = False)#

This wrapper records videos of rollouts.

Usually, you only want to record episodes intermittently, say every hundredth episode. To do this, you can specify either episode_trigger or step_trigger (not both). They should be functions returning a boolean that indicates whether a recording should be started at the current episode or step, respectively. If neither episode_trigger nor step_trigger is passed, a default episode_trigger will be employed. By default, the recording will be stopped once a terminated or truncated signal has been emitted by the environment. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for video_length.

Wrapper records videos of rollouts.

Parameters:

env – The environment that will be wrapped
video_folder (str) – The folder where the recordings will be stored
episode_trigger – Function that accepts an integer and returns True iff a recording should be started at this episode
step_trigger – Function that accepts an integer and returns True iff a recording should be started at this step
video_length (int) – The length of recorded episodes. If 0, entire episodes are recorded. Otherwise, snippets of the specified length are captured
name_prefix (str) – Will be prepended to the filename of the recordings
disable_logger (bool) – Whether to disable moviepy logger or not.

class gymnasium.wrappers.RenderCollection(env: Env, pop_frames: bool = True, reset_clean: bool = True)#

Save collection of render frames.

Initialize a RenderCollection instance.

Parameters:

env – The environment that is being wrapped
pop_frames (bool) – If true, clear the collection frames after .render() is called.
True. (Default value is) –
reset_clean (bool) – If true, clear the collection frames when .reset() is called.
True. –

class gymnasium.wrappers.TimeLimit(env: Env, max_episode_steps: Optional[int] = None)#

This wrapper will issue a truncated signal if a maximum number of timesteps is exceeded.

If a truncation is not defined inside the environment itself, this is the only place that the truncation signal is issued. Critically, this is different from the terminated signal that originates from the underlying environment as part of the MDP.

Example

>>> from gymnasium.envs.classic_control import CartPoleEnv
>>> from gymnasium.wrappers import TimeLimit
>>> env = CartPoleEnv()
>>> env = TimeLimit(env, max_episode_steps=1000)

Initializes the TimeLimit wrapper with an environment and the number of steps after which truncation will occur.

Parameters:

env – The environment to apply the wrapper
max_episode_steps – An optional max episode steps (if Ǹone, env.spec.max_episode_steps is used)

class gymnasium.wrappers.VectorListInfo(env)#

Converts infos of vectorized environments from dict to List[dict].

This wrapper converts the info format of a vector environment from a dictionary to a list of dictionaries. This wrapper is intended to be used around vectorized environments. If using other wrappers that perform operation on info like RecordEpisodeStatistics this need to be the outermost wrapper.

i.e. VectorListInfo(RecordEpisodeStatistics(envs))

Example:

>>> # actual
>>> {
...      "k": np.array[0., 0., 0.5, 0.3],
...      "_k": np.array[False, False, True, True]
...  }
>>> # classic
>>> [{}, {}, {k: 0.5}, {k: 0.3}]

This wrapper will convert the info into the list format.

Parameters:: env (Env) – The environment to apply the wrapper