Wrappers#

Observation Wrappers#

class gymnasium.experimental.wrappers.LambdaObservationV0(env: gym.Env[ObsType, ActType], func: Callable[[ObsType], Any], observation_space: gym.Space[WrapperObsType] | None)[source]#

Transforms an observation via a function provided to the wrapper.

The function func will be applied to all observations. If the observations from func are outside the bounds of the env’s observation space, provide an observation_space.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import LambdaObservationV0
>>> import numpy as np
>>> np.random.seed(0)
>>> env = gym.make("CartPole-v1")
>>> env = LambdaObservationV0(env, lambda obs: obs + 0.1 * np.random.random(obs.shape), env.observation_space)
>>> env.reset(seed=42)
(array([0.08227695, 0.06540678, 0.09613613, 0.07422512]), {})
Parameters:
  • env – The environment to wrap

  • func – A function that will transform an observation. If this transformed observation is outside the observation space of env.observation_space then provide an observation_space.

  • observation_space – The observation spaces of the wrapper, if None, then it is assumed the same as env.observation_space.

class gymnasium.experimental.wrappers.FilterObservationV0(env: gym.Env[ObsType, ActType], filter_keys: Sequence[str | int])[source]#

Filters Dict or Tuple observation space by the keys or indexes.

Example

>>> import gymnasium as gym
>>> from gymnasium.wrappers import TransformObservation
>>> from gymnasium.experimental.wrappers import FilterObservationV0
>>> env = gym.make("CartPole-v1")
>>> env = gym.wrappers.TransformObservation(env, lambda obs: {'obs': obs, 'time': 0})
>>> env.observation_space = gym.spaces.Dict(obs=env.observation_space, time=gym.spaces.Discrete(1))
>>> env.reset(seed=42)
({'obs': array([ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ], dtype=float32), 'time': 0}, {})
>>> env = FilterObservationV0(env, filter_keys=['time'])
>>> env.reset(seed=42)
({'time': 0}, {})
>>> env.step(0)
({'time': 0}, 1.0, False, False, {})
Parameters:
  • env – The environment to wrap

  • filter_keys – The subspaces to be included, use a list of strings or integers for Dict and Tuple spaces respectivesly

class gymnasium.experimental.wrappers.FlattenObservationV0(env: Env[ObsType, ActType])[source]#

Observation wrapper that flattens the observation.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import FlattenObservationV0
>>> env = gym.make("CarRacing-v2")
>>> env.observation_space.shape
(96, 96, 3)
>>> env = FlattenObservationV0(env)
>>> env.observation_space.shape
(27648,)
>>> obs, _ = env.reset()
>>> obs.shape
(27648,)
Parameters:

env – The environment to wrap

class gymnasium.experimental.wrappers.GrayscaleObservationV0(env: Env[ObsType, ActType], keep_dim: bool = False)[source]#

Observation wrapper that converts an RGB image to grayscale.

The keep_dim will keep the channel dimension

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import GrayscaleObservationV0
>>> env = gym.make("CarRacing-v2")
>>> env.observation_space.shape
(96, 96, 3)
>>> grayscale_env = GrayscaleObservationV0(env)
>>> grayscale_env.observation_space.shape
(96, 96)
>>> grayscale_env = GrayscaleObservationV0(env, keep_dim=True)
>>> grayscale_env.observation_space.shape
(96, 96, 1)
Parameters:
  • env – The environment to wrap

  • keep_dim – If to keep the channel in the observation, if True, obs.shape == 3 else obs.shape == 2

class gymnasium.experimental.wrappers.ResizeObservationV0(env: Env[ObsType, ActType], shape: tuple[int, ...])[source]#

Resizes image observations using OpenCV to shape.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import ResizeObservationV0
>>> env = gym.make("CarRacing-v2")
>>> env.observation_space.shape
(96, 96, 3)
>>> resized_env = ResizeObservationV0(env, (32, 32))
>>> resized_env.observation_space.shape
(32, 32, 3)
Parameters:
  • env – The environment to wrap

  • shape – The resized observation shape

class gymnasium.experimental.wrappers.ReshapeObservationV0(env: gym.Env[ObsType, ActType], shape: int | tuple[int, ...])[source]#

Reshapes array based observations to shapes.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import ReshapeObservationV0
>>> env = gym.make("CarRacing-v2")
>>> env.observation_space.shape
(96, 96, 3)
>>> reshape_env = ReshapeObservationV0(env, (24, 4, 96, 1, 3))
>>> reshape_env.observation_space.shape
(24, 4, 96, 1, 3)
Parameters:
  • env – The environment to wrap

  • shape – The reshaped observation space

class gymnasium.experimental.wrappers.RescaleObservationV0(env: gym.Env[ObsType, ActType], min_obs: np.floating | np.integer | np.ndarray, max_obs: np.floating | np.integer | np.ndarray)[source]#

Linearly rescales observation to between a minimum and maximum value.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import RescaleObservationV0
>>> env = gym.make("Pendulum-v1")
>>> env.observation_space
Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)
>>> env = RescaleObservationV0(env, np.array([-2, -1, -10], dtype=np.float32), np.array([1, 0, 1], dtype=np.float32))
>>> env.observation_space
Box([ -2.  -1. -10.], [1. 0. 1.], (3,), float32)
Parameters:
  • env – The environment to wrap

  • min_obs – The new minimum observation bound

  • max_obs – The new maximum observation bound

class gymnasium.experimental.wrappers.DtypeObservationV0(env: Env[ObsType, ActType], dtype: Any)[source]#

Observation wrapper for transforming the dtype of an observation.

Note

This is only compatible with Box, Discrete, MultiDiscrete and MultiBinary observation spaces

Parameters:
  • env – The environment to wrap

  • dtype – The new dtype of the observation

class gymnasium.experimental.wrappers.PixelObservationV0(env: Env[ObsType, ActType], pixels_only: bool = True, pixels_key: str = 'pixels', obs_key: str = 'state')[source]#

Includes the rendered observations to the environment’s observations.

Observations of this wrapper will be dictionaries of images. You can also choose to add the observation of the base environment to this dictionary. In that case, if the base environment has an observation space of type Dict, the dictionary of rendered images will be updated with the base environment’s observation. If, however, the observation space is of type Box, the base environment’s observation (which will be an element of the Box space) will be added to the dictionary under the key “state”.

Parameters:
  • env – The environment to wrap.

  • pixels_only (bool) – If True (default), the original observation returned by the wrapped environment will be discarded, and a dictionary observation will only include pixels. If False, the observation dictionary will contain both the original observations and the pixel observations.

  • pixels_key – Optional custom string specifying the pixel key. Defaults to “pixels”

  • obs_key – Optional custom string specifying the obs key. Defaults to “state”

class gymnasium.experimental.wrappers.NormalizeObservationV0(env: Env[ObsType, ActType], epsilon: float = 1e-8)[source]#

This wrapper will normalize observations s.t. each coordinate is centered with unit variance.

The property _update_running_mean allows to freeze/continue the running mean calculation of the observation statistics. If True (default), the RunningMeanStd will get updated every time self.observation() is called. If False, the calculated statistics are used but not updated anymore; this may be used during evaluation.

Note

The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently.

Parameters:
  • env (Env) – The environment to apply the wrapper

  • epsilon – A stability parameter that is used when scaling the observations.

class gymnasium.experimental.wrappers.TimeAwareObservationV0(env: Env[ObsType, ActType], flatten: bool = False, normalize_time: bool = True, *, dict_time_key: str = 'time')[source]#

Augment the observation with time information of the episode.

The normalize_time if True represents time as a normalized value between [0,1] otherwise if False, the number of timesteps remaining before truncation occurs is an integer.

For environments with Dict observation spaces, the time information is automatically added in the key “time” (can be changed through dict_time_key) and for environments with Tuple observation space, the time information is added as the final element in the tuple. Otherwise, the observation space is transformed into a Dict observation space with two keys, “obs” for the base environment’s observation and “time” for the time information.

To flatten the observation, use the flatten parameter which will use the gymnasium.spaces.utils.flatten() function.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import TimeAwareObservationV0
>>> env = gym.make("CartPole-v1")
>>> env = TimeAwareObservationV0(env)
>>> env.observation_space
Dict('obs': Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), 'time': Box(0.0, 1.0, (1,), float32))
>>> env.reset(seed=42)[0]
{'obs': array([ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ], dtype=float32), 'time': array([0.], dtype=float32)}
>>> _ = env.action_space.seed(42)
>>> env.step(env.action_space.sample())[0]
{'obs': array([ 0.02727336, -0.20172954,  0.03625453,  0.32351476], dtype=float32), 'time': array([0.002], dtype=float32)}
Unnormalize time observation space example:
>>> env = gym.make('CartPole-v1')
>>> env = TimeAwareObservationV0(env, normalize_time=False)
>>> env.observation_space
Dict('obs': Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), 'time': Box(0, 500, (1,), int32))
>>> env.reset(seed=42)[0]
{'obs': array([ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ], dtype=float32), 'time': array([500], dtype=int32)}
>>> _ = env.action_space.seed(42)[0]
>>> env.step(env.action_space.sample())[0]
{'obs': array([ 0.02727336, -0.20172954,  0.03625453,  0.32351476], dtype=float32), 'time': array([499], dtype=int32)}
Flatten observation space example:
>>> env = gym.make("CartPole-v1")
>>> env = TimeAwareObservationV0(env, flatten=True)
>>> env.observation_space
Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38
  0.0000000e+00], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38 1.0000000e+00], (5,), float32)
>>> env.reset(seed=42)[0]
array([ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ,  0.        ],
      dtype=float32)
>>> _ = env.action_space.seed(42)
>>> env.step(env.action_space.sample())[0]
array([ 0.02727336, -0.20172954,  0.03625453,  0.32351476,  0.002     ],
      dtype=float32)
Parameters:
  • env – The environment to apply the wrapper

  • flatten – Flatten the observation to a Box of a single dimension

  • normalize_time – if True return time in the range [0,1] otherwise return time as remaining timesteps before truncation

  • dict_time_key – For environment with a Dict observation space, the key for the time space. By default, “time”.

class gymnasium.experimental.wrappers.FrameStackObservationV0(env: gym.Env[ObsType, ActType], stack_size: int, *, zeros_obs: ObsType | None = None)[source]#

Observation wrapper that stacks the observations in a rolling manner.

For example, if the number of stacks is 4, then the returned observation contains the most recent 4 observations. For environment ‘Pendulum-v1’, the original observation is an array with shape [3], so if we stack 4 observations, the processed observation has shape [4, 3].

Note

  • After reset() is called, the frame buffer will be filled with the initial observation. I.e. the observation returned by reset() will consist of num_stack many identical frames.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import FrameStackObservationV0
>>> env = gym.make("CarRacing-v2")
>>> env = FrameStackObservationV0(env, 4)
>>> env.observation_space
Box(0, 255, (4, 96, 96, 3), uint8)
>>> obs, _ = env.reset()
>>> obs.shape
(4, 96, 96, 3)
Parameters:
  • env – The environment to apply the wrapper

  • stack_size – The number of frames to stack with zero_obs being used originally.

  • zeros_obs – Keyword only parameter that allows a custom padding observation at reset()

class gymnasium.experimental.wrappers.DelayObservationV0(env: Env[ObsType, ActType], delay: int)[source]#

Wrapper which adds a delay to the returned observation.

Before reaching the delay number of timesteps, returned observations is an array of zeros with the same shape as the observation space.

Example

>>> import gymnasium as gym
>>> env = gym.make("CartPole-v1")
>>> env.reset(seed=123)
(array([ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], dtype=float32), {})
>>> env = DelayObservationV0(env, delay=2)
>>> env.reset(seed=123)
(array([0., 0., 0., 0.], dtype=float32), {})
>>> env.step(env.action_space.sample())
(array([0., 0., 0., 0.], dtype=float32), 1.0, False, False, {})
>>> env.step(env.action_space.sample())
(array([ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], dtype=float32), 1.0, False, False, {})

Note

This does not support random delay values, if users are interested, please raise an issue or pull request to add this feature.

Parameters:
  • env – The environment to wrap

  • delay – The number of timesteps to delay observations

class gymnasium.experimental.wrappers.AtariPreprocessingV0(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)[source]#

Atari 2600 preprocessing wrapper.

This class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.

Specifically, the following preprocess stages applies to the atari environment:

  • Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.

  • Frame skipping: The number of frames skipped between steps, 4 by default

  • Max-pooling: Pools over the most recent two observations from the frame skips

  • Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.

    Turned off by default. Not recommended by Machado et al. (2018).

  • Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default

  • Grayscale observation: If the observation is colour or greyscale, by default, greyscale.

  • Scale observation: If to scale the observation between [0, 1) or [0, 255), by default, not scaled.

Parameters:
  • env (Env) – The environment to apply the preprocessing

  • noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.

  • frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.

  • screen_size (int) – resize Atari frame

  • terminal_on_life_loss (bool) – if True, then step() returns terminated=True whenever a life is lost.

  • grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.

  • grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.

  • scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.

Raises:
  • DependencyNotInstalled – opencv-python package not installed

  • ValueError – Disable frame-skipping in the original env

Action Wrappers#

class gymnasium.experimental.wrappers.LambdaActionV0(env: gym.Env[ObsType, ActType], func: Callable[[WrapperActType], ActType], action_space: Space[WrapperActType] | None)[source]#

A wrapper that provides a function to modify the action passed to step().

Parameters:
  • env – The environment to wrap

  • func – Function to apply to the step()’s action

  • action_space – The updated action space of the wrapper given the function.

class gymnasium.experimental.wrappers.ClipActionV0(env: Env[ObsType, ActType])[source]#

Clip the continuous action within the valid Box observation space bound.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import ClipActionV0
>>> import numpy as np
>>> env = gym.make("Hopper-v4", disable_env_checker=True)
>>> env = ClipActionV0(env)
>>> env.action_space
Box(-inf, inf, (3,), float32)
>>> _ = env.reset(seed=42)
>>> _ = env.step(np.array([5.0, -2.0, 0.0], dtype=np.float32))
... # Executes the action np.array([1.0, -1.0, 0]) in the base environment
Parameters:

env – The environment to wrap

class gymnasium.experimental.wrappers.RescaleActionV0(env: gym.Env[ObsType, ActType], min_action: float | int | np.ndarray, max_action: float | int | np.ndarray)[source]#

Affinely rescales the continuous action space of the environment to the range [min_action, max_action].

The base environment env must have an action space of type spaces.Box. If min_action or max_action are numpy arrays, the shape must match the shape of the environment’s action space.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import RescaleActionV0
>>> import numpy as np
>>> env = gym.make("Hopper-v4", disable_env_checker=True)
>>> _ = env.reset(seed=42)
>>> obs, _, _, _, _ = env.step(np.array([1, 1, 1], dtype=np.float32))
>>> _ = env.reset(seed=42)
>>> min_action = -0.5
>>> max_action = np.array([0.0, 0.5, 0.75], dtype=np.float32)
>>> wrapped_env = RescaleActionV0(env, min_action=min_action, max_action=max_action)
>>> wrapped_env_obs, _, _, _, _ = wrapped_env.step(max_action)
>>> np.alltrue(obs == wrapped_env_obs)
True
Parameters:
  • env (Env) – The environment to wrap

  • min_action (float, int or np.ndarray) – The min values for each action. This may be a numpy array or a scalar.

  • max_action (float, int or np.ndarray) – The max values for each action. This may be a numpy array or a scalar.

class gymnasium.experimental.wrappers.StickyActionV0(env: Env[ObsType, ActType], repeat_action_probability: float)[source]#

Wrapper which adds a probability of repeating the previous action.

This wrapper follows the implementation proposed by Machado et al., 2018 in Section 5.2 on page 12.

Parameters:
  • env (Env) – the wrapped environment

  • repeat_action_probability (int | float) – a probability of repeating the old action.

Reward Wrappers#

class gymnasium.experimental.wrappers.LambdaRewardV0(env: Env[ObsType, ActType], func: Callable[[SupportsFloat], SupportsFloat])[source]#

A reward wrapper that allows a custom function to modify the step reward.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import LambdaRewardV0
>>> env = gym.make("CartPole-v1")
>>> env = LambdaRewardV0(env, lambda r: 2 * r + 1)
>>> _ = env.reset()
>>> _, rew, _, _, _ = env.step(0)
>>> rew
3.0
Parameters:
  • env (Env) – The environment to wrap

  • func – (Callable): The function to apply to reward

class gymnasium.experimental.wrappers.ClipRewardV0(env: gym.Env[ObsType, ActType], min_reward: float | np.ndarray | None = None, max_reward: float | np.ndarray | None = None)[source]#

A wrapper that clips the rewards for an environment between an upper and lower bound.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import ClipRewardV0
>>> env = gym.make("CartPole-v1")
>>> env = ClipRewardV0(env, 0, 0.5)
>>> _ = env.reset()
>>> _, rew, _, _, _ = env.step(1)
>>> rew
0.5
Parameters:
  • env (Env) – The environment to wrap

  • min_reward (Union[float, np.ndarray]) – lower bound to apply

  • max_reward (Union[float, np.ndarray]) – higher bound to apply

class gymnasium.experimental.wrappers.NormalizeRewardV1(env: Env[ObsType, ActType], gamma: float = 0.99, epsilon: float = 1e-8)[source]#

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

The property _update_running_mean allows to freeze/continue the running mean calculation of the reward statistics. If True (default), the RunningMeanStd will get updated every time self.normalize() is called. If False, the calculated statistics are used but not updated anymore; this may be used during evaluation.

Note

In v0.27, NormalizeReward was updated as the forward discounted reward estimate was incorrect computed in Gym v0.25+. For more detail, read [#3154](https://github.com/openai/gym/pull/3152).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

Parameters:
  • env (env) – The environment to apply the wrapper

  • epsilon (float) – A stability parameter

  • gamma (float) – The discount factor that is used in the exponential moving average.

Other Wrappers#

class gymnasium.experimental.wrappers.AutoresetV0(env: Env[ObsType, ActType])[source]#

A class for providing an automatic reset functionality for gymnasium environments when calling self.step().

Parameters:

env (gym.Env) – The environment to apply the wrapper

class gymnasium.experimental.wrappers.PassiveEnvCheckerV0(env: Env[ObsType, ActType])[source]#

A passive environment checker wrapper that surrounds the step, reset and render functions to check they follow the gymnasium API.

Initialises the wrapper with the environments, run the observation and action space tests.

class gymnasium.experimental.wrappers.OrderEnforcingV0(env: Env[ObsType, ActType], disable_render_order_enforcing: bool = False)[source]#

A wrapper that will produce an error if step() is called before an initial reset().

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import OrderEnforcingV0
>>> env = gym.make("CartPole-v1", render_mode="human")
>>> env = OrderEnforcingV0(env)
>>> env.step(0)
Traceback (most recent call last):
    ...
gymnasium.error.ResetNeeded: Cannot call env.step() before calling env.reset()
>>> env.render()
Traceback (most recent call last):
    ...
gymnasium.error.ResetNeeded: Cannot call `env.render()` before calling `env.reset()`, if this is a intended action, set `disable_render_order_enforcing=True` on the OrderEnforcer wrapper.
>>> _ = env.reset()
>>> env.render()
>>> _ = env.step(0)
>>> env.close()
Parameters:
  • env – The environment to wrap

  • disable_render_order_enforcing – If to disable render order enforcing

class gymnasium.experimental.wrappers.RecordEpisodeStatisticsV0(env: gym.Env[ObsType, ActType], buffer_length: int | None = 100, stats_key: str = 'episode')[source]#

This wrapper will keep track of cumulative rewards and episode lengths.

At the end of an episode, the statistics of the episode will be added to info using the key episode. If using a vectorized environment also the key _episode is used which indicates whether the env at the respective index has the episode statistics.

After the completion of an episode, info will look like this:

>>> info = {
...     "episode": {
...         "r": "<cumulative reward>",
...         "l": "<episode length>",
...         "t": "<elapsed time since beginning of episode>"
...     },
... }

For a vectorized environments the output will be in the form of:

>>> infos = {
...     "final_observation": "<array of length num-envs>",
...     "_final_observation": "<boolean array of length num-envs>",
...     "final_info": "<array of length num-envs>",
...     "_final_info": "<boolean array of length num-envs>",
...     "episode": {
...         "r": "<array of cumulative reward>",
...         "l": "<array of episode length>",
...         "t": "<array of elapsed time since beginning of episode>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }

Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via wrapped_env.return_queue and wrapped_env.length_queue respectively.

Variables:
  • episode_reward_buffer – The cumulative rewards of the last deque_size-many episodes

  • episode_length_buffer – The lengths of the last deque_size-many episodes

Parameters:
  • env (Env) – The environment to apply the wrapper

  • buffer_length – The size of the buffers return_queue and length_queue

  • stats_key – The info key for the episode statistics

Rendering Wrappers#

class gymnasium.experimental.wrappers.RecordVideoV0(env: gym.Env[ObsType, ActType], video_folder: str, episode_trigger: Callable[[int], bool] | None = None, step_trigger: Callable[[int], bool] | None = None, video_length: int = 0, name_prefix: str = 'rl-video', fps: int | None = None, disable_logger: bool = False)[source]#

This wrapper records videos of rollouts.

Usually, you only want to record episodes intermittently, say every hundredth episode. To do this, you can specify episode_trigger or step_trigger. They should be functions returning a boolean that indicates whether a recording should be started at the current episode or step, respectively. If neither episode_trigger nor step_trigger is passed, a default episode_trigger will be employed, i.e. capped_cubic_video_schedule. This function starts a video at every episode that is a power of 3 until 1000 and then every 1000 episodes. By default, the recording will be stopped once reset is called. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for video_length.

Parameters:
  • env – The environment that will be wrapped

  • video_folder (str) – The folder where the recordings will be stored

  • episode_trigger – Function that accepts an integer and returns True iff a recording should be started at this episode

  • step_trigger – Function that accepts an integer and returns True iff a recording should be started at this step

  • video_length (int) – The length of recorded episodes. If 0, entire episodes are recorded. Otherwise, snippets of the specified length are captured

  • name_prefix (str) – Will be prepended to the filename of the recordings

  • fps (int) – The frame per second in the video. The default value is the one specified in the environment metadata. If the environment metadata doesn’t specify render_fps, the value 30 is used.

  • disable_logger (bool) – Whether to disable moviepy logger or not

class gymnasium.experimental.wrappers.HumanRenderingV0(env: Env[ObsType, ActType])[source]#

Performs human rendering for an environment that only supports “rgb_array”rendering.

This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify "render_fps" in the metadata of your environment.

The render_mode of the wrapped environment must be either 'rgb_array' or 'rgb_array_list'.

Example

>>> import gymnasium as gym
>>> from gymnasium.experimental.wrappers import HumanRenderingV0
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array")
>>> wrapped = HumanRenderingV0(env)
>>> obs, _ = wrapped.reset()     # This will start rendering to the screen

The wrapper can also be applied directly when the environment is instantiated, simply by passing render_mode="human" to make. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e. render_mode does not contain "human").

>>> env = gym.make("phys2d/CartPole-v1", render_mode="human")      # CartPoleJax-v1 doesn't implement human-rendering natively
>>> obs, _ = env.reset()     # This will start rendering to the screen

Warning: If the base environment uses render_mode="rgb_array_list", its (i.e. the base environment’s) render method will always return an empty list:

>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list")
>>> wrapped = HumanRenderingV0(env)
>>> obs, _ = wrapped.reset()
>>> env.render() # env.render() will always return an empty list!
[]
Parameters:

env – The environment that is being wrapped

class gymnasium.experimental.wrappers.RenderCollectionV0(env: Env[ObsType, ActType], pop_frames: bool = True, reset_clean: bool = True)[source]#

Collect rendered frames of an environment such render returns a list[RenderedFrame].

Parameters:
  • env – The environment that is being wrapped

  • pop_frames (bool) – If true, clear the collection frames after meth:render is called. Default value is True.

  • reset_clean (bool) – If true, clear the collection frames when meth:reset is called. Default value is True.

Environment data conversion#

class gymnasium.experimental.wrappers.JaxToNumpyV0(env: Env[ObsType, ActType])[source]#

Wraps a jax environment so that it can be interacted with through numpy arrays.

Actions must be provided as numpy arrays and observations will be returned as numpy arrays.

Notes

The Jax To Numpy and Numpy to Jax conversion does not guarantee a roundtrip (jax -> numpy -> jax) and vice versa. The reason for this is jax does not support non-array values, therefore numpy int_32(5) -> DeviceArray([5], dtype=jnp.int23)

Parameters:

env – the jax environment to wrap

class gymnasium.experimental.wrappers.JaxToTorchV0(env: gym.Env, device: Device | None = None)[source]#

Wraps a Jax-based environment so that it can be interacted with through PyTorch Tensors.

Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors.

Note

For rendered this is returned as a NumPy array not a pytorch Tensor.

Parameters:
  • env – The Jax-based environment to wrap

  • device – The device the torch Tensors should be moved to

class gymnasium.experimental.wrappers.NumpyToTorchV0(env: gym.Env, device: Device | None = None)[source]#

Wraps a numpy-based environment so that it can be interacted with through PyTorch Tensors.

Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors.

Note

For rendered this is returned as a NumPy array not a pytorch Tensor.

Parameters:
  • env – The Jax-based environment to wrap

  • device – The device the torch Tensors should be moved to