Misc Wrappers#

class gymnasium.wrappers.AtariPreprocessing(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)[source]#

Atari 2600 preprocessing wrapper.

This class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.

Specifically, the following preprocess stages applies to the atari environment:

  • Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.

  • Frame skipping: The number of frames skipped between steps, 4 by default

  • Max-pooling: Pools over the most recent two observations from the frame skips

  • Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.

    Turned off by default. Not recommended by Machado et al. (2018).

  • Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default

  • Grayscale observation: If the observation is colour or greyscale, by default, greyscale.

  • Scale observation: If to scale the observation between [0, 1) or [0, 255), by default, not scaled.

Parameters:
  • env (Env) – The environment to apply the preprocessing

  • noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.

  • frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.

  • screen_size (int) – resize Atari frame

  • terminal_on_life_loss (bool) – if True, then step() returns terminated=True whenever a life is lost.

  • grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.

  • grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.

  • scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.

Raises:
  • DependencyNotInstalled – opencv-python package not installed

  • ValueError – Disable frame-skipping in the original env

class gymnasium.wrappers.AutoResetWrapper(env: Env)[source]#

A class for providing an automatic reset functionality for gymnasium environments when calling self.step().

When calling step causes Env.step() to return terminated=True or truncated=True, Env.reset() is called, and the return format of self.step() is as follows: (new_obs, final_reward, final_terminated, final_truncated, info) with new step API and (new_obs, final_reward, final_done, info) with the old step API.

  • new_obs is the first observation after calling self.env.reset()

  • final_reward is the reward after calling self.env.step(), prior to calling self.env.reset().

  • final_terminated is the terminated value before calling self.env.reset().

  • final_truncated is the truncated value before calling self.env.reset(). Both final_terminated and final_truncated cannot be False.

  • info is a dict containing all the keys from the info dict returned by the call to self.env.reset(), with an additional key “final_observation” containing the observation returned by the last call to self.env.step() and “final_info” containing the info dict returned by the last call to self.env.step().

Warning

When using this wrapper to collect rollouts, note that when Env.step() returns terminated or truncated, a new observation from after calling Env.reset() is returned by Env.step() alongside the final reward, terminated and truncated state from the previous episode. If you need the final state from the previous episode, you need to retrieve it via the “final_observation” key in the info dict. Make sure you know what you’re doing if you use this wrapper!

Parameters:

env (gym.Env) – The environment to apply the wrapper

class gymnasium.wrappers.EnvCompatibility(old_env: LegacyEnv, render_mode: str | None = None)[source]#

A wrapper which can transform an environment from the old API to the new API.

Old step API refers to step() method returning (observation, reward, done, info), and reset() only retuning the observation. New step API refers to step() method returning (observation, reward, terminated, truncated, info) and reset() returning (observation, info). (Refer to docs for details on the API change)

Known limitations: - Environments that use self.np_random might not work as expected.

Parameters:
  • old_env (LegacyEnv) – the env to wrap, implemented with the old API

  • render_mode (str) – the render mode to use when rendering the environment, passed automatically to env.render

class gymnasium.wrappers.StepAPICompatibility(env: Env, output_truncation_bool: bool = True)[source]#

A wrapper which can transform an environment from new step API to old and vice-versa.

Old step API refers to step() method returning (observation, reward, done, info) New step API refers to step() method returning (observation, reward, terminated, truncated, info) (Refer to docs for details on the API change)

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import StepAPICompatibility
>>> env = gym.make("CartPole-v1")
>>> env # wrapper not applied by default, set to new API
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
>>> env = StepAPICompatibility(gym.make("CartPole-v1"))
>>> env
<StepAPICompatibility<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>>
Parameters:
  • env (gym.Env) – the env to wrap. Can be in old or new API

  • output_truncation_bool (bool) – Whether the wrapper’s step method outputs two booleans (new API) or one boolean (old API)

class gymnasium.wrappers.PassiveEnvChecker(env)[source]#

A passive environment checker wrapper that surrounds the step, reset and render functions to check they follow the gymnasium API.

Initialises the wrapper with the environments, run the observation and action space tests.

class gymnasium.wrappers.HumanRendering(env)[source]#

Performs human rendering for an environment that only supports “rgb_array”rendering.

This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify "render_fps" in the metadata of your environment.

The render_mode of the wrapped environment must be either 'rgb_array' or 'rgb_array_list'.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import HumanRendering
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> wrapped = HumanRendering(env)
>>> obs, _ = wrapped.reset()     # This will start rendering to the screen

The wrapper can also be applied directly when the environment is instantiated, simply by passing render_mode="human" to make. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e. render_mode does not contain "human").

>>> env = gym.make("phys2d/CartPole-v1", render_mode="human")      # phys2d/CartPole-v1 doesn't implement human-rendering natively
>>> obs, _ = env.reset()     # This will start rendering to the screen

Warning: If the base environment uses render_mode="rgb_array_list", its (i.e. the base environment’s) render method will always return an empty list:

>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list")
>>> wrapped = HumanRendering(env)
>>> obs, _ = wrapped.reset()
>>> env.render()     # env.render() will always return an empty list!
[]
Parameters:

env – The environment that is being wrapped

class gymnasium.wrappers.OrderEnforcing(env: Env, disable_render_order_enforcing: bool = False)[source]#

A wrapper that will produce an error if step() is called before an initial reset().

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import OrderEnforcing
>>> env = gym.make("CartPole-v1", render_mode="human")
>>> env = OrderEnforcing(env)
>>> env.step(0)
Traceback (most recent call last):
    ...
gymnasium.error.ResetNeeded: Cannot call env.step() before calling env.reset()
>>> env.render()
Traceback (most recent call last):
    ...
gymnasium.error.ResetNeeded: Cannot call `env.render()` before calling `env.reset()`, if this is a intended action, set `disable_render_order_enforcing=True` on the OrderEnforcer wrapper.
>>> _ = env.reset()
>>> env.render()
>>> _ = env.step(0)
>>> env.close()
Parameters:
  • env – The environment to wrap

  • disable_render_order_enforcing – If to disable render order enforcing

class gymnasium.wrappers.RecordEpisodeStatistics(env: Env, deque_size: int = 100)[source]#

This wrapper will keep track of cumulative rewards and episode lengths.

At the end of an episode, the statistics of the episode will be added to info using the key episode. If using a vectorized environment also the key _episode is used which indicates whether the env at the respective index has the episode statistics.

After the completion of an episode, info will look like this:

>>> info = {
...     "episode": {
...         "r": "<cumulative reward>",
...         "l": "<episode length>",
...         "t": "<elapsed time since beginning of episode>"
...     },
... }

For a vectorized environments the output will be in the form of:

>>> infos = {
...     "final_observation": "<array of length num-envs>",
...     "_final_observation": "<boolean array of length num-envs>",
...     "final_info": "<array of length num-envs>",
...     "_final_info": "<boolean array of length num-envs>",
...     "episode": {
...         "r": "<array of cumulative reward>",
...         "l": "<array of episode length>",
...         "t": "<array of elapsed time since beginning of episode>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }

Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via wrapped_env.return_queue and wrapped_env.length_queue respectively.

Variables:
  • return_queue – The cumulative rewards of the last deque_size-many episodes

  • length_queue – The lengths of the last deque_size-many episodes

Parameters:
  • env (Env) – The environment to apply the wrapper

  • deque_size – The size of the buffers return_queue and length_queue

class gymnasium.wrappers.RecordVideo(env: Env, video_folder: str, episode_trigger: Callable[[int], bool] | None = None, step_trigger: Callable[[int], bool] | None = None, video_length: int = 0, name_prefix: str = 'rl-video', disable_logger: bool = False)[source]#

This wrapper records videos of rollouts.

Usually, you only want to record episodes intermittently, say every hundredth episode. To do this, you can specify either episode_trigger or step_trigger (not both). They should be functions returning a boolean that indicates whether a recording should be started at the current episode or step, respectively. If neither episode_trigger nor step_trigger is passed, a default episode_trigger will be employed. By default, the recording will be stopped once a terminated or truncated signal has been emitted by the environment. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for video_length.

Parameters:
  • env – The environment that will be wrapped

  • video_folder (str) – The folder where the recordings will be stored

  • episode_trigger – Function that accepts an integer and returns True iff a recording should be started at this episode

  • step_trigger – Function that accepts an integer and returns True iff a recording should be started at this step

  • video_length (int) – The length of recorded episodes. If 0, entire episodes are recorded. Otherwise, snippets of the specified length are captured

  • name_prefix (str) – Will be prepended to the filename of the recordings

  • disable_logger (bool) – Whether to disable moviepy logger or not.

class gymnasium.wrappers.RenderCollection(env: Env, pop_frames: bool = True, reset_clean: bool = True)[source]#

Save collection of render frames.

Parameters:
  • env – The environment that is being wrapped

  • pop_frames (bool) – If true, clear the collection frames after .render() is called.

  • True. (Default value is) –

  • reset_clean (bool) – If true, clear the collection frames when .reset() is called.

  • True.

class gymnasium.wrappers.TimeLimit(env: Env, max_episode_steps: int)[source]#

This wrapper will issue a truncated signal if a maximum number of timesteps is exceeded.

If a truncation is not defined inside the environment itself, this is the only place that the truncation signal is issued. Critically, this is different from the terminated signal that originates from the underlying environment as part of the MDP.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import TimeLimit
>>> env = gym.make("CartPole-v1")
>>> env = TimeLimit(env, max_episode_steps=1000)
Parameters:
  • env – The environment to apply the wrapper

  • max_episode_steps – An optional max episode steps (if None, env.spec.max_episode_steps is used)

class gymnasium.wrappers.VectorListInfo(env)[source]#

Converts infos of vectorized environments from dict to List[dict].

This wrapper converts the info format of a vector environment from a dictionary to a list of dictionaries. This wrapper is intended to be used around vectorized environments. If using other wrappers that perform operation on info like RecordEpisodeStatistics this need to be the outermost wrapper.

i.e. VectorListInfo(RecordEpisodeStatistics(envs))

Example

>>> # As dict:
>>> infos = {
...     "final_observation": "<array of length num-envs>",
...     "_final_observation": "<boolean array of length num-envs>",
...     "final_info": "<array of length num-envs>",
...     "_final_info": "<boolean array of length num-envs>",
...     "episode": {
...         "r": "<array of cumulative reward>",
...         "l": "<array of episode length>",
...         "t": "<array of elapsed time since beginning of episode>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }
>>> # As list:
>>> infos = [
...     {
...         "episode": {"r": "<cumulative reward>", "l": "<episode length>", "t": "<elapsed time since beginning of episode>"},
...         "final_observation": "<observation>",
...         "final_info": {},
...     },
...     ...,
... ]
Parameters:

env (Env) – The environment to apply the wrapper