Misc Wrappers#
Common Wrappers#
- class gymnasium.wrappers.TimeLimit(env: Env, max_episode_steps: int)[source]#
Limits the number of steps for an environment through truncating the environment if a maximum number of timesteps is exceeded.
If a truncation is not defined inside the environment itself, this is the only place that the truncation signal is issued. Critically, this is different from the terminated signal that originates from the underlying environment as part of the MDP. No vector wrapper exists.
- Example using the TimeLimit wrapper:
>>> from gymnasium.wrappers import TimeLimit >>> from gymnasium.envs.classic_control import CartPoleEnv
>>> spec = gym.spec("CartPole-v1") >>> spec.max_episode_steps 500 >>> env = gym.make("CartPole-v1") >>> env # TimeLimit is included within the environment stack <TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>> >>> env.spec EnvSpec(id='CartPole-v1', ..., max_episode_steps=500, ...) >>> env = gym.make("CartPole-v1", max_episode_steps=3) >>> env.spec EnvSpec(id='CartPole-v1', ..., max_episode_steps=3, ...) >>> env = TimeLimit(CartPoleEnv(), max_episode_steps=10) >>> env <TimeLimit<CartPoleEnv instance>>
- Example of TimeLimit determining the episode step
>>> env = gym.make("CartPole-v1", max_episode_steps=3) >>> _ = env.reset(seed=123) >>> _ = env.action_space.seed(123) >>> _, _, terminated, truncated, _ = env.step(env.action_space.sample()) >>> terminated, truncated (False, False) >>> _, _, terminated, truncated, _ = env.step(env.action_space.sample()) >>> terminated, truncated (False, False) >>> _, _, terminated, truncated, _ = env.step(env.action_space.sample()) >>> terminated, truncated (False, True)
- Change logs:
v0.10.6 - Initially added
v0.25.0 - With the step API update, the termination and truncation signal is returned separately.
- Parameters:
env – The environment to apply the wrapper
max_episode_steps – An optional max episode steps (if
None
,env.spec.max_episode_steps
is used)
- class gymnasium.wrappers.RecordVideo(env: gym.Env[ObsType, ActType], video_folder: str, episode_trigger: Callable[[int], bool] | None = None, step_trigger: Callable[[int], bool] | None = None, video_length: int = 0, name_prefix: str = 'rl-video', fps: int | None = None, disable_logger: bool = True)[source]#
Records videos of environment episodes using the environment’s render function.
Usually, you only want to record episodes intermittently, say every hundredth episode or at every thousandth environment step. To do this, you can specify
episode_trigger
orstep_trigger
. They should be functions returning a boolean that indicates whether a recording should be started at the current episode or step, respectively.The
episode_trigger
should returnTrue
on the episode when recording should start. Thestep_trigger
should returnTrue
on the n-th environment step that the recording should be started, where n sums over all previous episodes. If neitherepisode_trigger
norstep_trigger
is passed, a defaultepisode_trigger
will be employed, i.e.capped_cubic_video_schedule()
. This function starts a video at every episode that is a power of 3 until 1000 and then every 1000 episodes. By default, the recording will be stopped once reset is called. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value forvideo_length
.No vector version of the wrapper exists.
- Examples - Run the environment for 50 episodes, and save the video every 10 episodes starting from the 0th:
>>> import os >>> import gymnasium as gym >>> env = gym.make("LunarLander-v2", render_mode="rgb_array") >>> trigger = lambda t: t % 10 == 0 >>> env = RecordVideo(env, video_folder="./save_videos1", episode_trigger=trigger, disable_logger=True) >>> for i in range(50): ... termination, truncation = False, False ... _ = env.reset(seed=123) ... while not (termination or truncation): ... obs, rew, termination, truncation, info = env.step(env.action_space.sample()) ... >>> env.close() >>> len(os.listdir("./save_videos1")) 5
- Examples - Run the environment for 5 episodes, start a recording every 200th step, making sure each video is 100 frames long:
>>> import os >>> import gymnasium as gym >>> env = gym.make("LunarLander-v2", render_mode="rgb_array") >>> trigger = lambda t: t % 200 == 0 >>> env = RecordVideo(env, video_folder="./save_videos2", step_trigger=trigger, video_length=100, disable_logger=True) >>> for i in range(5): ... termination, truncation = False, False ... _ = env.reset(seed=123) ... _ = env.action_space.seed(123) ... while not (termination or truncation): ... obs, rew, termination, truncation, info = env.step(env.action_space.sample()) ... >>> env.close() >>> len(os.listdir("./save_videos2")) 2
- Examples - Run 3 episodes, record everything, but in chunks of 1000 frames:
>>> import os >>> import gymnasium as gym >>> env = gym.make("LunarLander-v2", render_mode="rgb_array") >>> env = RecordVideo(env, video_folder="./save_videos3", video_length=1000, disable_logger=True) >>> for i in range(3): ... termination, truncation = False, False ... _ = env.reset(seed=123) ... while not (termination or truncation): ... obs, rew, termination, truncation, info = env.step(env.action_space.sample()) ... >>> env.close() >>> len(os.listdir("./save_videos3")) 2
- Change logs:
v0.25.0 - Initially added to replace
wrappers.monitoring.VideoRecorder
- Parameters:
env – The environment that will be wrapped
video_folder (str) – The folder where the recordings will be stored
episode_trigger – Function that accepts an integer and returns
True
iff a recording should be started at this episodestep_trigger – Function that accepts an integer and returns
True
iff a recording should be started at this stepvideo_length (int) – The length of recorded episodes. If 0, entire episodes are recorded. Otherwise, snippets of the specified length are captured
name_prefix (str) – Will be prepended to the filename of the recordings
fps (int) – The frame per second in the video. Provides a custom video fps for environment, if
None
then the environment metadatarender_fps
key is used if it exists, otherwise a default value of 30 is used.disable_logger (bool) – Whether to disable moviepy logger or not, default it is disabled
- class gymnasium.wrappers.RecordEpisodeStatistics(env: Env[ObsType, ActType], buffer_length: int = 100, stats_key: str = 'episode')[source]#
This wrapper will keep track of cumulative rewards and episode lengths.
At the end of an episode, the statistics of the episode will be added to
info
using the keyepisode
. If using a vectorized environment also the key_episode
is used which indicates whether the env at the respective index has the episode statistics. A vector version of the wrapper exists,gymnasium.wrappers.vector.RecordEpisodeStatistics
.After the completion of an episode,
info
will look like this:>>> info = { ... "episode": { ... "r": "<cumulative reward>", ... "l": "<episode length>", ... "t": "<elapsed time since beginning of episode>" ... }, ... }
For a vectorized environments the output will be in the form of:
>>> infos = { ... "final_observation": "<array of length num-envs>", ... "_final_observation": "<boolean array of length num-envs>", ... "final_info": "<array of length num-envs>", ... "_final_info": "<boolean array of length num-envs>", ... "episode": { ... "r": "<array of cumulative reward>", ... "l": "<array of episode length>", ... "t": "<array of elapsed time since beginning of episode>" ... }, ... "_episode": "<boolean array of length num-envs>" ... }
Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via
wrapped_env.return_queue
andwrapped_env.length_queue
respectively.- Variables:
time_queue (*) – The time length of the last
deque_size
-many episodesreturn_queue (*) – The cumulative rewards of the last
deque_size
-many episodeslength_queue (*) – The lengths of the last
deque_size
-many episodes
- Change logs:
v0.15.4 - Initially added
v1.0.0 - Removed vector environment support (see
gymnasium.wrappers.vector.RecordEpisodeStatistics
) and add attributetime_queue
- Parameters:
env (Env) – The environment to apply the wrapper
buffer_length – The size of the buffers
return_queue
,length_queue
andtime_queue
stats_key – The info key for the episode statistics
- class gymnasium.wrappers.AtariPreprocessing(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)[source]#
Implements the common preprocessing techniques for Atari environments (excluding frame stacking).
For frame stacking use
gymnasium.wrappers.FrameStackObservation
. No vector version of the wrapper existsThis class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.
Specifically, the following preprocess stages applies to the atari environment:
Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.
Frame skipping: The number of frames skipped between steps, 4 by default.
Max-pooling: Pools over the most recent two observations from the frame skips.
- Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.
Turned off by default. Not recommended by Machado et al. (2018).
Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default.
Grayscale observation: Makes the observation greyscale, enabled by default.
Grayscale new axis: Extends the last channel of the observation such that the image is 3-dimensional, not enabled by default.
Scale observation: Whether to scale the observation between [0, 1) or [0, 255), not scaled by default.
Example
>>> import gymnasium as gym >>> env = gym.make("ALE/Adventure-v5") >>> env = AtariPreprocessing(env, noop_max=10, frame_skip=0, screen_size=84, terminal_on_life_loss=True, grayscale_obs=False, grayscale_newaxis=False)
- Change logs:
Added in gym v0.12.2 (gym #1455)
- Parameters:
env (Env) – The environment to apply the preprocessing
noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.
frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.
screen_size (int) – resize Atari frame.
terminal_on_life_loss (bool) – if True, then
step()
returns terminated=True whenever a life is lost.grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.
grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.
scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.
- Raises:
DependencyNotInstalled – opencv-python package not installed
ValueError – Disable frame-skipping in the original env
Uncommon Wrappers#
- class gymnasium.wrappers.Autoreset(env: Env)[source]#
The wrapped environment is automatically reset when an terminated or truncated state is reached.
When calling step causes
Env.step()
to return terminated=True or truncated=True,Env.reset()
is called, and the return format ofself.step()
is as follows:(new_obs, final_reward, final_terminated, final_truncated, info)
with new step API and(new_obs, final_reward, final_done, info)
with the old step API. No vector version of the wrapper exists.obs
is the first observation after callingself.env.reset()
final_reward
is the reward after callingself.env.step()
, prior to callingself.env.reset()
.final_terminated
is the terminated value before callingself.env.reset()
.final_truncated
is the truncated value before callingself.env.reset()
. Both final_terminated and final_truncated cannot be False.info
is a dict containing all the keys from the info dict returned by the call toself.env.reset()
, with an additional key “final_observation” containing the observation returned by the last call toself.env.step()
and “final_info” containing the info dict returned by the last call toself.env.step()
.
Warning
When using this wrapper to collect rollouts, note that when
Env.step()
returns terminated or truncated, a new observation from after callingEnv.reset()
is returned byEnv.step()
alongside the final reward, terminated and truncated state from the previous episode. If you need the final state from the previous episode, you need to retrieve it via the “final_observation” key in the info dict. Make sure you know what you’re doing if you use this wrapper!- Change logs:
v0.24.0 - Initially added as AutoResetWrapper
v1.0.0 - renamed to Autoreset and autoreset order was changed to reset on the step after the environment terminates or truncates. As a result, “final_observation” and “final_info” is removed.
- Parameters:
env (gym.Env) – The environment to apply the wrapper
- class gymnasium.wrappers.PassiveEnvChecker(env: Env[ObsType, ActType])[source]#
A passive wrapper that surrounds the
step
,reset
andrender
functions to check they follow Gymnasium’s API.This wrapper is automatically applied during make and can be disabled with disable_env_checker. No vector version of the wrapper exists.
Example
>>> import gymnasium as gym >>> env = gym.make("CartPole-v1") >>> env <TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>> >>> env = gym.make("CartPole-v1", disable_env_checker=True) >>> env <TimeLimit<OrderEnforcing<CartPoleEnv<CartPole-v1>>>>
- Change logs:
v0.24.1 - Initially added however broken in several ways
v0.25.0 - Bugs was all fixed
v0.29.0 - Removed warnings for infinite bounds for Box observation and action spaces and inregular bound shapes
Initialises the wrapper with the environments, run the observation and action space tests.
- class gymnasium.wrappers.HumanRendering(env: Env[ObsType, ActType])[source]#
Allows human like rendering for environments that support “rgb_array” rendering.
This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify
"render_fps"
in the metadata of your environment.The
render_mode
of the wrapped environment must be either'rgb_array'
or'rgb_array_list'
.No vector version of the wrapper exists.
Example
>>> import gymnasium as gym >>> from gymnasium.wrappers import HumanRendering >>> env = gym.make("LunarLander-v2", render_mode="rgb_array") >>> wrapped = HumanRendering(env) >>> obs, _ = wrapped.reset() # This will start rendering to the screen
The wrapper can also be applied directly when the environment is instantiated, simply by passing
render_mode="human"
tomake
. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e.render_mode
does not contain"human"
).>>> env = gym.make("phys2d/CartPole-v1", render_mode="human") # CartPoleJax-v1 doesn't implement human-rendering natively >>> obs, _ = env.reset() # This will start rendering to the screen
Warning: If the base environment uses
render_mode="rgb_array_list"
, its (i.e. the base environment’s) render method will always return an empty list:>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list") >>> wrapped = HumanRendering(env) >>> obs, _ = wrapped.reset() >>> env.render() # env.render() will always return an empty list! []
- Change logs:
v0.25.0 - Initially added
- Parameters:
env – The environment that is being wrapped
- class gymnasium.wrappers.OrderEnforcing(env: Env[ObsType, ActType], disable_render_order_enforcing: bool = False)[source]#
Will produce an error if
step
orrender
is called beforereset
.No vector version of the wrapper exists.
Example
>>> import gymnasium as gym >>> from gymnasium.wrappers import OrderEnforcing >>> env = gym.make("CartPole-v1", render_mode="human") >>> env = OrderEnforcing(env) >>> env.step(0) Traceback (most recent call last): ... gymnasium.error.ResetNeeded: Cannot call env.step() before calling env.reset() >>> env.render() Traceback (most recent call last): ... gymnasium.error.ResetNeeded: Cannot call `env.render()` before calling `env.reset()`, if this is an intended action, set `disable_render_order_enforcing=True` on the OrderEnforcer wrapper. >>> _ = env.reset() >>> env.render() >>> _ = env.step(0) >>> env.close()
- Change logs:
v0.22.0 - Initially added
v0.24.0 - Added order enforcing for the render function
- Parameters:
env – The environment to wrap
disable_render_order_enforcing – If to disable render order enforcing
- class gymnasium.wrappers.RenderCollection(env: Env[ObsType, ActType], pop_frames: bool = True, reset_clean: bool = True)[source]#
Collect rendered frames of an environment such
render
returns alist[RenderedFrame]
.No vector version of the wrapper exists.
Example
Return the list of frames for the number of steps
render
wasn’t called. >>> import gymnasium as gym >>> env = gym.make(“LunarLander-v2”, render_mode=”rgb_array”) >>> env = RenderCollection(env) >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> frames = env.render() >>> len(frames) 6>>> frames = env.render() >>> len(frames) 0
Return the list of frames for the number of steps the episode was running. >>> import gymnasium as gym >>> env = gym.make(“LunarLander-v2”, render_mode=”rgb_array”) >>> env = RenderCollection(env, pop_frames=False) >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> frames = env.render() >>> len(frames) 6
>>> frames = env.render() >>> len(frames) 6
Collect all frames for all episodes, without clearing them when render is called >>> import gymnasium as gym >>> env = gym.make(“LunarLander-v2”, render_mode=”rgb_array”) >>> env = RenderCollection(env, pop_frames=False, reset_clean=False) >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> _ = env.reset(seed=123) >>> for _ in range(5): … _ = env.step(env.action_space.sample()) … >>> frames = env.render() >>> len(frames) 12
>>> frames = env.render() >>> len(frames) 12
- Change logs:
v0.26.2 - Initially added
- Parameters:
env – The environment that is being wrapped
pop_frames (bool) – If true, clear the collection frames after
meth:render
is called. Default value isTrue
.reset_clean (bool) – If true, clear the collection frames when
meth:reset
is called. Default value isTrue
.
Data Conversion Wrappers#
- class gymnasium.wrappers.JaxToNumpy(env: Env[ObsType, ActType])[source]#
Wraps a Jax-based environment such that it can be interacted with NumPy arrays.
Actions must be provided as numpy arrays and observations will be returned as numpy arrays. A vector version of the wrapper exists,
gymnasium.wrappers.vector.JaxToNumpy
.Notes
The Jax To Numpy and Numpy to Jax conversion does not guarantee a roundtrip (jax -> numpy -> jax) and vice versa. The reason for this is jax does not support non-array values, therefore numpy
int_32(5) -> DeviceArray([5], dtype=jnp.int23)
Example
>>> import gymnasium as gym >>> env = gym.make("JaxEnv-vx") >>> env = JaxToNumpy(env) >>> obs, _ = env.reset(seed=123) >>> type(obs) <class 'numpy.ndarray'> >>> action = env.action_space.sample() >>> obs, reward, terminated, truncated, info = env.step(action) >>> type(obs) <class 'numpy.ndarray'> >>> type(reward) <class 'float'> >>> type(terminated) <class 'bool'> >>> type(truncated) <class 'bool'>
- Change logs:
v1.0.0 - Initially added
- Parameters:
env – the jax environment to wrap
- class gymnasium.wrappers.JaxToTorch(env: gym.Env, device: Device | None = None)[source]#
Wraps a Jax-based environment so that it can be interacted with PyTorch Tensors.
Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors. A vector version of the wrapper exists,
gymnasium.wrappers.vector.JaxToTorch
.Note
For
rendered
this is returned as a NumPy array not a pytorch Tensor.Example
>>> import torch >>> import gymnasium as gym >>> env = gym.make("JaxEnv-vx") >>> env = JaxtoTorch(env) >>> obs, _ = env.reset(seed=123) >>> type(obs) <class 'torch.Tensor'> >>> action = torch.tensor(env.action_space.sample()) >>> obs, reward, terminated, truncated, info = env.step(action) >>> type(obs) <class 'torch.Tensor'> >>> type(reward) <class 'float'> >>> type(terminated) <class 'bool'> >>> type(truncated) <class 'bool'>
- Change logs:
v1.0.0 - Initially added
- Parameters:
env – The Jax-based environment to wrap
device – The device the torch Tensors should be moved to
- class gymnasium.wrappers.NumpyToTorch(env: gym.Env, device: Device | None = None)[source]#
Wraps a NumPy-based environment such that it can be interacted with PyTorch Tensors.
Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors. A vector version of the wrapper exists,
gymnasium.wrappers.vector.NumpyToTorch
.Note
For
rendered
this is returned as a NumPy array not a pytorch Tensor.Example
>>> import torch >>> import gymnasium as gym >>> env = gym.make("CartPole-v1") >>> env = NumpyToTorch(env) >>> obs, _ = env.reset(seed=123) >>> type(obs) <class 'torch.Tensor'> >>> action = torch.tensor(env.action_space.sample()) >>> obs, reward, terminated, truncated, info = env.step(action) >>> type(obs) <class 'torch.Tensor'> >>> type(reward) <class 'float'> >>> type(terminated) <class 'bool'> >>> type(truncated) <class 'bool'>
- Change logs:
v1.0.0 - Initially added
- Parameters:
env – The Jax-based environment to wrap
device – The device the torch Tensors should be moved to