Misc Wrappers#

Common Wrappers#

class gymnasium.wrappers.TimeLimit(env: Env, max_episode_steps: int)[source]#

Limits the number of steps for an environment through truncating the environment if a maximum number of timesteps is exceeded.

If a truncation is not defined inside the environment itself, this is the only place that the truncation signal is issued. Critically, this is different from the terminated signal that originates from the underlying environment as part of the MDP. No vector wrapper exists.

Example using the TimeLimit wrapper:
>>> from gymnasium.wrappers import TimeLimit
>>> from gymnasium.envs.classic_control import CartPoleEnv
>>> spec = gym.spec("CartPole-v1")
>>> spec.max_episode_steps
500
>>> env = gym.make("CartPole-v1")
>>> env  # TimeLimit is included within the environment stack
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
>>> env.spec  
EnvSpec(id='CartPole-v1', ..., max_episode_steps=500, ...)
>>> env = gym.make("CartPole-v1", max_episode_steps=3)
>>> env.spec  
EnvSpec(id='CartPole-v1', ..., max_episode_steps=3, ...)
>>> env = TimeLimit(CartPoleEnv(), max_episode_steps=10)
>>> env
<TimeLimit<CartPoleEnv instance>>
Example of TimeLimit determining the episode step
>>> env = gym.make("CartPole-v1", max_episode_steps=3)
>>> _ = env.reset(seed=123)
>>> _ = env.action_space.seed(123)
>>> _, _, terminated, truncated, _ = env.step(env.action_space.sample())
>>> terminated, truncated
(False, False)
>>> _, _, terminated, truncated, _ = env.step(env.action_space.sample())
>>> terminated, truncated
(False, False)
>>> _, _, terminated, truncated, _ = env.step(env.action_space.sample())
>>> terminated, truncated
(False, True)
Change logs:
  • v0.10.6 - Initially added

  • v0.25.0 - With the step API update, the termination and truncation signal is returned separately.

Parameters:
  • env – The environment to apply the wrapper

  • max_episode_steps – An optional max episode steps (if None, env.spec.max_episode_steps is used)

class gymnasium.wrappers.RecordVideo(env: gym.Env[ObsType, ActType], video_folder: str, episode_trigger: Callable[[int], bool] | None = None, step_trigger: Callable[[int], bool] | None = None, video_length: int = 0, name_prefix: str = 'rl-video', fps: int | None = None, disable_logger: bool = True)[source]#

Records videos of environment episodes using the environment’s render function.

Usually, you only want to record episodes intermittently, say every hundredth episode or at every thousandth environment step. To do this, you can specify episode_trigger or step_trigger. They should be functions returning a boolean that indicates whether a recording should be started at the current episode or step, respectively.

The episode_trigger should return True on the episode when recording should start. The step_trigger should return True on the n-th environment step that the recording should be started, where n sums over all previous episodes. If neither episode_trigger nor step_trigger is passed, a default episode_trigger will be employed, i.e. capped_cubic_video_schedule(). This function starts a video at every episode that is a power of 3 until 1000 and then every 1000 episodes. By default, the recording will be stopped once reset is called. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for video_length.

No vector version of the wrapper exists.

Examples - Run the environment for 50 episodes, and save the video every 10 episodes starting from the 0th:
>>> import os
>>> import gymnasium as gym
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> trigger = lambda t: t % 10 == 0
>>> env = RecordVideo(env, video_folder="./save_videos1", episode_trigger=trigger, disable_logger=True)
>>> for i in range(50):
...     termination, truncation = False, False
...     _ = env.reset(seed=123)
...     while not (termination or truncation):
...         obs, rew, termination, truncation, info = env.step(env.action_space.sample())
...
>>> env.close()
>>> len(os.listdir("./save_videos1"))
5
Examples - Run the environment for 5 episodes, start a recording every 200th step, making sure each video is 100 frames long:
>>> import os
>>> import gymnasium as gym
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> trigger = lambda t: t % 200 == 0
>>> env = RecordVideo(env, video_folder="./save_videos2", step_trigger=trigger, video_length=100, disable_logger=True)
>>> for i in range(5):
...     termination, truncation = False, False
...     _ = env.reset(seed=123)
...     _ = env.action_space.seed(123)
...     while not (termination or truncation):
...         obs, rew, termination, truncation, info = env.step(env.action_space.sample())
...
>>> env.close()
>>> len(os.listdir("./save_videos2"))
2
Examples - Run 3 episodes, record everything, but in chunks of 1000 frames:
>>> import os
>>> import gymnasium as gym
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> env = RecordVideo(env, video_folder="./save_videos3", video_length=1000, disable_logger=True)
>>> for i in range(3):
...     termination, truncation = False, False
...     _ = env.reset(seed=123)
...     while not (termination or truncation):
...         obs, rew, termination, truncation, info = env.step(env.action_space.sample())
...
>>> env.close()
>>> len(os.listdir("./save_videos3"))
2
Change logs:
  • v0.25.0 - Initially added to replace wrappers.monitoring.VideoRecorder

Parameters:
  • env – The environment that will be wrapped

  • video_folder (str) – The folder where the recordings will be stored

  • episode_trigger – Function that accepts an integer and returns True iff a recording should be started at this episode

  • step_trigger – Function that accepts an integer and returns True iff a recording should be started at this step

  • video_length (int) – The length of recorded episodes. If 0, entire episodes are recorded. Otherwise, snippets of the specified length are captured

  • name_prefix (str) – Will be prepended to the filename of the recordings

  • fps (int) – The frame per second in the video. Provides a custom video fps for environment, if None then the environment metadata render_fps key is used if it exists, otherwise a default value of 30 is used.

  • disable_logger (bool) – Whether to disable moviepy logger or not, default it is disabled

class gymnasium.wrappers.RecordEpisodeStatistics(env: Env[ObsType, ActType], buffer_length: int = 100, stats_key: str = 'episode')[source]#

This wrapper will keep track of cumulative rewards and episode lengths.

At the end of an episode, the statistics of the episode will be added to info using the key episode. If using a vectorized environment also the key _episode is used which indicates whether the env at the respective index has the episode statistics. A vector version of the wrapper exists, gymnasium.wrappers.vector.RecordEpisodeStatistics.

After the completion of an episode, info will look like this:

>>> info = {
...     "episode": {
...         "r": "<cumulative reward>",
...         "l": "<episode length>",
...         "t": "<elapsed time since beginning of episode>"
...     },
... }

For a vectorized environments the output will be in the form of:

>>> infos = {
...     "final_observation": "<array of length num-envs>",
...     "_final_observation": "<boolean array of length num-envs>",
...     "final_info": "<array of length num-envs>",
...     "_final_info": "<boolean array of length num-envs>",
...     "episode": {
...         "r": "<array of cumulative reward>",
...         "l": "<array of episode length>",
...         "t": "<array of elapsed time since beginning of episode>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }

Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via wrapped_env.return_queue and wrapped_env.length_queue respectively.

Variables:
  • time_queue (*) – The time length of the last deque_size-many episodes

  • return_queue (*) – The cumulative rewards of the last deque_size-many episodes

  • length_queue (*) – The lengths of the last deque_size-many episodes

Change logs:
Parameters:
  • env (Env) – The environment to apply the wrapper

  • buffer_length – The size of the buffers return_queue, length_queue and time_queue

  • stats_key – The info key for the episode statistics

class gymnasium.wrappers.AtariPreprocessing(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)[source]#

Implements the common preprocessing techniques for Atari environments (excluding frame stacking).

For frame stacking use gymnasium.wrappers.FrameStackObservation. No vector version of the wrapper exists

This class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.

Specifically, the following preprocess stages applies to the atari environment:

  • Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.

  • Frame skipping: The number of frames skipped between steps, 4 by default.

  • Max-pooling: Pools over the most recent two observations from the frame skips.

  • Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.

    Turned off by default. Not recommended by Machado et al. (2018).

  • Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default.

  • Grayscale observation: Makes the observation greyscale, enabled by default.

  • Grayscale new axis: Extends the last channel of the observation such that the image is 3-dimensional, not enabled by default.

  • Scale observation: Whether to scale the observation between [0, 1) or [0, 255), not scaled by default.

Example

>>> import gymnasium as gym 
>>> env = gym.make("ALE/Adventure-v5") 
>>> env = AtariPreprocessing(env, noop_max=10, frame_skip=0, screen_size=84, terminal_on_life_loss=True, grayscale_obs=False, grayscale_newaxis=False) 
Change logs:
  • Added in gym v0.12.2 (gym #1455)

Parameters:
  • env (Env) – The environment to apply the preprocessing

  • noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.

  • frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.

  • screen_size (int) – resize Atari frame.

  • terminal_on_life_loss (bool) – if True, then step() returns terminated=True whenever a life is lost.

  • grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.

  • grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.

  • scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.

Raises:
  • DependencyNotInstalled – opencv-python package not installed

  • ValueError – Disable frame-skipping in the original env

Uncommon Wrappers#

class gymnasium.wrappers.Autoreset(env: Env)[source]#

The wrapped environment is automatically reset when an terminated or truncated state is reached.

When calling step causes Env.step() to return terminated=True or truncated=True, Env.reset() is called, and the return format of self.step() is as follows: (new_obs, final_reward, final_terminated, final_truncated, info) with new step API and (new_obs, final_reward, final_done, info) with the old step API. No vector version of the wrapper exists.

  • obs is the first observation after calling self.env.reset()

  • final_reward is the reward after calling self.env.step(), prior to calling self.env.reset().

  • final_terminated is the terminated value before calling self.env.reset().

  • final_truncated is the truncated value before calling self.env.reset(). Both final_terminated and final_truncated cannot be False.

  • info is a dict containing all the keys from the info dict returned by the call to self.env.reset(), with an additional key “final_observation” containing the observation returned by the last call to self.env.step() and “final_info” containing the info dict returned by the last call to self.env.step().

Warning

When using this wrapper to collect rollouts, note that when Env.step() returns terminated or truncated, a new observation from after calling Env.reset() is returned by Env.step() alongside the final reward, terminated and truncated state from the previous episode. If you need the final state from the previous episode, you need to retrieve it via the “final_observation” key in the info dict. Make sure you know what you’re doing if you use this wrapper!

Change logs:
  • v0.24.0 - Initially added as AutoResetWrapper

  • v1.0.0 - renamed to Autoreset and autoreset order was changed to reset on the step after the environment terminates or truncates. As a result, “final_observation” and “final_info” is removed.

Parameters:

env (gym.Env) – The environment to apply the wrapper

class gymnasium.wrappers.PassiveEnvChecker(env: Env[ObsType, ActType])[source]#

A passive wrapper that surrounds the step, reset and render functions to check they follow Gymnasium’s API.

This wrapper is automatically applied during make and can be disabled with disable_env_checker. No vector version of the wrapper exists.

Example

>>> import gymnasium as gym
>>> env = gym.make("CartPole-v1")
>>> env
<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
>>> env = gym.make("CartPole-v1", disable_env_checker=True)
>>> env
<TimeLimit<OrderEnforcing<CartPoleEnv<CartPole-v1>>>>
Change logs:
  • v0.24.1 - Initially added however broken in several ways

  • v0.25.0 - Bugs was all fixed

  • v0.29.0 - Removed warnings for infinite bounds for Box observation and action spaces and inregular bound shapes

Initialises the wrapper with the environments, run the observation and action space tests.

class gymnasium.wrappers.HumanRendering(env: Env[ObsType, ActType])[source]#

Allows human like rendering for environments that support “rgb_array” rendering.

This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify "render_fps" in the metadata of your environment.

The render_mode of the wrapped environment must be either 'rgb_array' or 'rgb_array_list'.

No vector version of the wrapper exists.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import HumanRendering
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> wrapped = HumanRendering(env)
>>> obs, _ = wrapped.reset()     # This will start rendering to the screen

The wrapper can also be applied directly when the environment is instantiated, simply by passing render_mode="human" to make. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e. render_mode does not contain "human").

>>> env = gym.make("phys2d/CartPole-v1", render_mode="human")      # CartPoleJax-v1 doesn't implement human-rendering natively
>>> obs, _ = env.reset()     # This will start rendering to the screen

Warning: If the base environment uses render_mode="rgb_array_list", its (i.e. the base environment’s) render method will always return an empty list:

>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list")
>>> wrapped = HumanRendering(env)
>>> obs, _ = wrapped.reset()
>>> env.render() # env.render() will always return an empty list!
[]
Change logs:
  • v0.25.0 - Initially added

Parameters:

env – The environment that is being wrapped

class gymnasium.wrappers.OrderEnforcing(env: Env[ObsType, ActType], disable_render_order_enforcing: bool = False)[source]#

Will produce an error if step or render is called before reset.

No vector version of the wrapper exists.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import OrderEnforcing
>>> env = gym.make("CartPole-v1", render_mode="human")
>>> env = OrderEnforcing(env)
>>> env.step(0)
Traceback (most recent call last):
    ...
gymnasium.error.ResetNeeded: Cannot call env.step() before calling env.reset()
>>> env.render()
Traceback (most recent call last):
    ...
gymnasium.error.ResetNeeded: Cannot call `env.render()` before calling `env.reset()`, if this is an intended action, set `disable_render_order_enforcing=True` on the OrderEnforcer wrapper.
>>> _ = env.reset()
>>> env.render()
>>> _ = env.step(0)
>>> env.close()
Change logs:
  • v0.22.0 - Initially added

  • v0.24.0 - Added order enforcing for the render function

Parameters:
  • env – The environment to wrap

  • disable_render_order_enforcing – If to disable render order enforcing

class gymnasium.wrappers.RenderCollection(env: Env[ObsType, ActType], pop_frames: bool = True, reset_clean: bool = True)[source]#

Collect rendered frames of an environment such render returns a list[RenderedFrame].

No vector version of the wrapper exists.

Example

Return the list of frames for the number of steps render wasn’t called. >>> import gymnasium as gym >>> env = gym.make(“LunarLander-v2”, render_mode=”rgb_array”) >>> env = RenderCollection(env) >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> frames = env.render() >>> len(frames) 6

>>> frames = env.render()
>>> len(frames)
0

Return the list of frames for the number of steps the episode was running. >>> import gymnasium as gym >>> env = gym.make(“LunarLander-v2”, render_mode=”rgb_array”) >>> env = RenderCollection(env, pop_frames=False) >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> frames = env.render() >>> len(frames) 6

>>> frames = env.render()
>>> len(frames)
6

Collect all frames for all episodes, without clearing them when render is called >>> import gymnasium as gym >>> env = gym.make(“LunarLander-v2”, render_mode=”rgb_array”) >>> env = RenderCollection(env, pop_frames=False, reset_clean=False) >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> frames = env.render() >>> len(frames) 12

>>> frames = env.render()
>>> len(frames)
12
Change logs:
  • v0.26.2 - Initially added

Parameters:
  • env – The environment that is being wrapped

  • pop_frames (bool) – If true, clear the collection frames after meth:render is called. Default value is True.

  • reset_clean (bool) – If true, clear the collection frames when meth:reset is called. Default value is True.

Data Conversion Wrappers#

class gymnasium.wrappers.JaxToNumpy(env: Env[ObsType, ActType])[source]#

Wraps a Jax-based environment such that it can be interacted with NumPy arrays.

Actions must be provided as numpy arrays and observations will be returned as numpy arrays. A vector version of the wrapper exists, gymnasium.wrappers.vector.JaxToNumpy.

Notes

The Jax To Numpy and Numpy to Jax conversion does not guarantee a roundtrip (jax -> numpy -> jax) and vice versa. The reason for this is jax does not support non-array values, therefore numpy int_32(5) -> DeviceArray([5], dtype=jnp.int23)

Example

>>> import gymnasium as gym                                     
>>> env = gym.make("JaxEnv-vx")                                 
>>> env = JaxToNumpy(env)                                       
>>> obs, _ = env.reset(seed=123)                                
>>> type(obs)                                                   
<class 'numpy.ndarray'>
>>> action = env.action_space.sample()                          
>>> obs, reward, terminated, truncated, info = env.step(action) 
>>> type(obs)                                                   
<class 'numpy.ndarray'>
>>> type(reward)                                                
<class 'float'>
>>> type(terminated)                                            
<class 'bool'>
>>> type(truncated)                                             
<class 'bool'>
Change logs:
  • v1.0.0 - Initially added

Parameters:

env – the jax environment to wrap

class gymnasium.wrappers.JaxToTorch(env: gym.Env, device: Device | None = None)[source]#

Wraps a Jax-based environment so that it can be interacted with PyTorch Tensors.

Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors. A vector version of the wrapper exists, gymnasium.wrappers.vector.JaxToTorch.

Note

For rendered this is returned as a NumPy array not a pytorch Tensor.

Example

>>> import torch                                                
>>> import gymnasium as gym                                     
>>> env = gym.make("JaxEnv-vx")                                 
>>> env = JaxtoTorch(env)                                       
>>> obs, _ = env.reset(seed=123)                                
>>> type(obs)                                                   
<class 'torch.Tensor'>
>>> action = torch.tensor(env.action_space.sample())            
>>> obs, reward, terminated, truncated, info = env.step(action) 
>>> type(obs)                                                   
<class 'torch.Tensor'>
>>> type(reward)                                                
<class 'float'>
>>> type(terminated)                                            
<class 'bool'>
>>> type(truncated)                                             
<class 'bool'>
Change logs:
  • v1.0.0 - Initially added

Parameters:
  • env – The Jax-based environment to wrap

  • device – The device the torch Tensors should be moved to

class gymnasium.wrappers.NumpyToTorch(env: gym.Env, device: Device | None = None)[source]#

Wraps a NumPy-based environment such that it can be interacted with PyTorch Tensors.

Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors. A vector version of the wrapper exists, gymnasium.wrappers.vector.NumpyToTorch.

Note

For rendered this is returned as a NumPy array not a pytorch Tensor.

Example

>>> import torch
>>> import gymnasium as gym
>>> env = gym.make("CartPole-v1")
>>> env = NumpyToTorch(env)
>>> obs, _ = env.reset(seed=123)
>>> type(obs)
<class 'torch.Tensor'>
>>> action = torch.tensor(env.action_space.sample())
>>> obs, reward, terminated, truncated, info = env.step(action)
>>> type(obs)
<class 'torch.Tensor'>
>>> type(reward)
<class 'float'>
>>> type(terminated)
<class 'bool'>
>>> type(truncated)
<class 'bool'>
Change logs:
  • v1.0.0 - Initially added

Parameters:
  • env – The Jax-based environment to wrap

  • device – The device the torch Tensors should be moved to