Wrappers#
Observation Wrappers#
- class gymnasium.experimental.wrappers.LambdaObservationV0(env: gym.Env, func: Callable[[ObsType], Any], observation_space: gym.Space | None)#
Transforms an observation via a function provided to the wrapper.
The function
func
will be applied to all observations. If the observations fromfunc
are outside the bounds of the env spaces, provide aobservation_space
.Example
>>> import gymnasium as gym >>> import numpy as np >>> env = gym.make('CartPole-v1') >>> env = LambdaObservationV0(env, lambda obs: obs + 0.1 * np.random.random(obs.shape)) >>> env.reset() array([-0.08319338, 0.04635121, -0.07394746, 0.20877492])
Constructor for the lambda observation wrapper.
- Parameters:
env – The environment to wrap
func – A function that will transform an observation. If this transformed observation is outside the observation space of env.observation_space then provide an observation_space.
observation_space – The observation spaces of the wrapper, if None, then it is assumed the same as env.observation_space.
- class gymnasium.experimental.wrappers.FilterObservationV0(env: gym.Env, filter_keys: Sequence[str | int])#
Filter Dict observation space by the keys.
Example
>>> import gymnasium as gym >>> env = gym.wrappers.TransformObservation( ... gym.make('CartPole-v1'), lambda obs: {'obs': obs, 'time': 0} ... ) >>> env.observation_space = gym.spaces.Dict(obs=env.observation_space, time=gym.spaces.Discrete(1)) >>> env.reset() {'obs': array([-0.00067088, -0.01860439, 0.04772898, -0.01911527], dtype=float32), 'time': 0} >>> env = FilterObservationV0(env, filter_keys=['time']) >>> env.reset() {'obs': array([ 0.04560107, 0.04466959, -0.0328232 , -0.02367178], dtype=float32)} >>> env.step(0) ({'obs': array([ 0.04649447, -0.14996664, -0.03329664, 0.25847703], dtype=float32)}, 1.0, False, {})
Constructor for an environment with a dictionary observation space where all
filter_keys
are in the observation space keys.
- class gymnasium.experimental.wrappers.FlattenObservationV0(env: Env)#
Observation wrapper that flattens the observation.
Example
>>> import gymnasium as gym >>> env = gym.make('CarRacing-v1') >>> env.observation_space.shape (96, 96, 3) >>> env = FlattenObservationV0(env) >>> env.observation_space.shape (27648,) >>> obs, info = env.reset() >>> obs.shape (27648,)
Constructor for any environment’s observation space that implements
spaces.utils.flatten_space
andspaces.utils.flatten
.
- class gymnasium.experimental.wrappers.GrayscaleObservationV0(env: Env, keep_dim: bool = False)#
Observation wrapper that converts an RGB image to grayscale.
The
keep_dim
will keep the channel dimensionExample
>>> import gymnasium as gym >>> env = gym.make("CarRacing-v1") >>> env.observation_space.shape (96, 96, 3) >>> grayscale_env = GrayscaleObservationV0(env) >>> grayscale_env.observation_space.shape (96, 96) >>> grayscale_env = GrayscaleObservationV0(env, keep_dim=True) >>> grayscale_env.observation_space.shape (96, 96, 1)
Constructor for an RGB image based environments to make the image grayscale.
- class gymnasium.experimental.wrappers.ResizeObservationV0(env: Env, shape: tuple[int, ...])#
Resizes image observations using OpenCV to shape.
Example
>>> import gymnasium as gym >>> env = gym.make("CarRacing-v2") >>> env.observation_space.shape (96, 96, 3) >>> resized_env = ResizeObservationV0(env, (32, 32)) >>> resized_env.observation_space.shape (32, 32, 3)
Constructor that requires an image environment observation space with a shape.
- class gymnasium.experimental.wrappers.ReshapeObservationV0(env: gym.Env, shape: int | tuple[int, ...])#
Reshapes array based observations to shapes.
Example
>>> import gymnasium as gym >>> env = gym.make("CarRacing-v1") >>> env.observation_space.shape (96, 96, 3) >>> reshape_env = ReshapeObservationV0(env, (24, 4, 96, 1, 3)) >>> reshape_env.observation_space.shape (24, 4, 96, 1, 3)
Constructor for env with Box observation space that has a shape product equal to the new shape product.
- class gymnasium.experimental.wrappers.RescaleObservationV0(env: gym.Env, min_obs: np.floating | np.integer | np.ndarray, max_obs: np.floating | np.integer | np.ndarray)#
Linearly rescales observation to between a minimum and maximum value.
Example
>>> import gymnasium as gym >>> env = gym.make("Pendulum-v1") >>> env.observation_space Box([-1. -1. -8.], [1. 1. 8.], (3,), float32) >>> env = RescaleObservationV0(env, np.array([-2, -1, -10]), np.array([1, 0, 1])) Box([-2. -1. -10.], [1. 0. 1.], (3,), float32)
Constructor that requires the env observation spaces to be a
Box
.
- class gymnasium.experimental.wrappers.DtypeObservationV0(env: Env, dtype: Any)#
Observation wrapper for transforming the dtype of an observation.
Constructor for Dtype, this is only valid with
Box
,Discrete
,MultiDiscrete
andMultiBinary
observation spaces.
- class gymnasium.experimental.wrappers.PixelObservationV0(env: Env[ObsType, ActType], pixels_only: bool = True, pixels_key: str = 'pixels', obs_key: str = 'state')#
Augment observations by pixel values.
Observations of this wrapper will be dictionaries of images. You can also choose to add the observation of the base environment to this dictionary. In that case, if the base environment has an observation space of type
Dict
, the dictionary of rendered images will be updated with the base environment’s observation. If, however, the observation space is of typeBox
, the base environment’s observation (which will be an element of theBox
space) will be added to the dictionary under the key “state”.Initializes a new pixel Wrapper.
- Parameters:
env – The environment to wrap.
pixels_only (bool) – If True (default), the original observation returned by the wrapped environment will be discarded, and a dictionary observation will only include pixels. If False, the observation dictionary will contain both the original observations and the pixel observations.
pixels_key – Optional custom string specifying the pixel key. Defaults to “pixels”
obs_key – Optional custom string specifying the obs key. Defaults to “state”
- class gymnasium.experimental.wrappers.NormalizeObservationV0(env: Env, epsilon: float = 1e-8)#
This wrapper will normalize observations s.t. each coordinate is centered with unit variance.
Note
The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently.
This wrapper will normalize observations s.t. each coordinate is centered with unit variance.
- Parameters:
env (Env) – The environment to apply the wrapper
epsilon – A stability parameter that is used when scaling the observations.
- class gymnasium.experimental.wrappers.TimeAwareObservationV0(env: Env, flatten: bool = False, normalize_time: bool = True, *, dict_time_key: str = 'time')#
Augment the observation with time information of the episode.
Time can be represented as a normalized value between [0,1] or by the number of timesteps remaining before truncation occurs.
For environments with
Dict
orTuple
observation spaces, by default, the time information is automatically added in the key “time” and as the final element in the tuple.Example
>>> import gymnasium as gym >>> from gymnasium.experimental.wrappers import TimeAwareObservationV0 >>> env = gym.make('CartPole-v1') >>> env = TimeAwareObservationV0(env) >>> env.observation_space Dict(obs: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), time: Box(0.0, 500, (1,), float32)) >>> _ = env.reset() >>> env.step(env.action_space.sample())[0] OrderedDict([('obs', ... array([ 0.02866629, 0.2310988 , -0.02614601, -0.2600732 ], dtype=float32)), ... ('time', array([0.002]))])
- Flatten observation space example:
>>> env = gym.make('CartPole-v1') >>> env = TimeAwareObservationV0(env, flatten=True) >>> env.observation_space Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38 0.0000000e+00], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38 500], (5,), float32) >>> _ = env.reset() >>> env.step(env.action_space.sample())[0] array([-0.01232257, 0.19335455, -0.02244143, -0.32388705, 0.002 ], dtype=float32)
Initialize
TimeAwareObservationV0
.- Parameters:
env – The environment to apply the wrapper
flatten – Flatten the observation to a Box of a single dimension
normalize_time – if True return time in the range [0,1] otherwise return time as remaining timesteps before truncation
dict_time_key – For environment with a
Dict
observation space, the key for the time space. By default, “time”.
- class gymnasium.experimental.wrappers.FrameStackObservationV0(env: Env[ObsType, ActType], stack_size: int)#
Observation wrapper that stacks the observations in a rolling manner.
For example, if the number of stacks is 4, then the returned observation contains the most recent 4 observations. For environment ‘Pendulum-v1’, the original observation is an array with shape [3], so if we stack 4 observations, the processed observation has shape [4, 3].
Note
After
reset()
is called, the frame buffer will be filled with the initial observation. I.e. the observation returned byreset()
will consist of num_stack many identical frames.
Example
>>> import gymnasium as gym >>> env = gym.make('CarRacing-v1') >>> env = FrameStack(env, 4) >>> env.observation_space Box(4, 96, 96, 3) >>> obs = env.reset() >>> obs.shape (4, 96, 96, 3)
Observation wrapper that stacks the observations in a rolling manner.
- Parameters:
env – The environment to apply the wrapper
stack_size – The number of frames to stack
- class gymnasium.experimental.wrappers.DelayObservationV0(env: Env, delay: int)#
Wrapper which adds a delay to the returned observation.
Initialize the DelayObservation wrapper.
- Parameters:
env (Env) – the wrapped environment
delay (int) – number of timesteps for delaying the observation. Before reaching the delay number of timesteps, returned observation is an array of zeros with the same shape of the observation space.
- class gymnasium.experimental.wrappers.AtariPreprocessingV0(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)#
Atari 2600 preprocessing wrapper.
This class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.
Specifically, the following preprocess stages applies to the atari environment:
Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.
Frame skipping: The number of frames skipped between steps, 4 by default
Max-pooling: Pools over the most recent two observations from the frame skips
- Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.
Turned off by default. Not recommended by Machado et al. (2018).
Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default
Grayscale observation: If the observation is colour or greyscale, by default, greyscale.
Scale observation: If to scale the observation between [0, 1) or [0, 255), by default, not scaled.
Wrapper for Atari 2600 preprocessing.
- Parameters:
env (Env) – The environment to apply the preprocessing
noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.
frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.
screen_size (int) – resize Atari frame
terminal_on_life_loss (bool) – if True, then
step()
returns terminated=True whenever a life is lost.grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.
grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.
scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.
- Raises:
DependencyNotInstalled – opencv-python package not installed
ValueError – Disable frame-skipping in the original env
Action Wrappers#
- class gymnasium.experimental.wrappers.LambdaActionV0(env: gym.Env, func: Callable[[WrapperActType], ActType], action_space: Space | None)#
A wrapper that provides a function to modify the action passed to
step()
.Initialize LambdaAction.
- Parameters:
env – The gymnasium environment
func – Function to apply to
step
action
action_space – The updated action space of the wrapper given the function.
- class gymnasium.experimental.wrappers.ClipActionV0(env: Env)#
Clip the continuous action within the valid
Box
observation space bound.Example
>>> import gymnasium as gym >>> import numpy as np >>> env = gym.make('BipedalWalker-v3', disable_env_checker=True) >>> env = ClipActionV0(env) >>> env.action_space Box(-1.0, 1.0, (4,), float32) >>> env.step(np.array([5.0, 2.0, -10.0, 0.0])) # Executes the action np.array([1.0, 1.0, -1.0, 0]) in the base environment
A wrapper for clipping continuous actions within the valid bound.
- Parameters:
env – The environment to apply the wrapper
- class gymnasium.experimental.wrappers.RescaleActionV0(env: gym.Env, min_action: float | int | np.ndarray, max_action: float | int | np.ndarray)#
Affinely rescales the continuous action space of the environment to the range [min_action, max_action].
The base environment
env
must have an action space of typespaces.Box
. Ifmin_action
ormax_action
are numpy arrays, the shape must match the shape of the environment’s action space.Example
>>> import gymnasium as gym >>> import numpy as np >>> env = gym.make('BipedalWalker-v3', disable_env_checker=True) >>> _ = env.reset(seed=42) >>> obs, _, _, _, _ = env.step(np.array([1,1,1,1])) >>> _ = env.reset(seed=42) >>> min_action = -0.5 >>> max_action = np.array([0.0, 0.5, 1.0, 0.75]) >>> wrapped_env = RescaleActionV0(env, min_action=min_action, max_action=max_action) >>> wrapped_env_obs, _, _, _, _ = wrapped_env.step(max_action) >>> np.alltrue(obs == wrapped_env_obs) True
Initializes the
RescaleAction
wrapper.- Parameters:
env (Env) – The environment to apply the wrapper
min_action (float, int or np.ndarray) – The min values for each action. This may be a numpy array or a scalar.
max_action (float, int or np.ndarray) – The max values for each action. This may be a numpy array or a scalar.
- class gymnasium.experimental.wrappers.StickyActionV0(env: Env, repeat_action_probability: float)#
Wrapper which adds a probability of repeating the previous action.
This wrapper follows the implementation proposed by Machado et al., 2018 in Section 5.2 on page 12.
Initialize StickyAction wrapper.
- Parameters:
env (Env) – the wrapped environment
repeat_action_probability (int | float) – a probability of repeating the old action.
Reward Wrappers#
- class gymnasium.experimental.wrappers.LambdaRewardV0(env: Env, func: Callable[[SupportsFloat], SupportsFloat])#
A reward wrapper that allows a custom function to modify the step reward.
Example
>>> import gymnasium as gym >>> from gymnasium.experimental.wrappers import LambdaRewardV0 >>> env = gym.make("CartPole-v1") >>> env = LambdaRewardV0(env, lambda r: 2 * r + 1) >>> _ = env.reset() >>> _, rew, _, _, _ = env.step(0) >>> rew 3.0
Initialize LambdaRewardV0 wrapper.
- Parameters:
env (Env) – The environment to apply the wrapper
func – (Callable): The function to apply to reward
- class gymnasium.experimental.wrappers.ClipRewardV0(env: gym.Env, min_reward: float | np.ndarray | None = None, max_reward: float | np.ndarray | None = None)#
A wrapper that clips the rewards for an environment between an upper and lower bound.
- Example with an upper and lower bound:
>>> import gymnasium as gym >>> from gymnasium.experimental.wrappers import ClipRewardV0 >>> env = gym.make("CartPole-v1") >>> env = ClipRewardV0(env, 0, 0.5) >>> env.reset() >>> _, rew, _, _, _ = env.step(1) >>> rew 0.5
Initialize ClipRewardsV0 wrapper.
- Parameters:
env (Env) – The environment to apply the wrapper
min_reward (Union[float, np.ndarray]) – lower bound to apply
max_reward (Union[float, np.ndarray]) – higher bound to apply
- class gymnasium.experimental.wrappers.NormalizeRewardV0(env: Env, gamma: float = 0.99, epsilon: float = 1e-8)#
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
- Parameters:
env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.
Other Wrappers#
- class gymnasium.experimental.wrappers.AutoresetV0(env: Env)#
A class for providing an automatic reset functionality for gymnasium environments when calling
self.step()
.A class for providing an automatic reset functionality for gymnasium environments when calling
self.step()
.- Parameters:
env (gym.Env) – The environment to apply the wrapper
- class gymnasium.experimental.wrappers.PassiveEnvCheckerV0(env: Env[ObsType, ActType])#
A passive environment checker wrapper that surrounds the step, reset and render functions to check they follow the gymnasium API.
Initialises the wrapper with the environments, run the observation and action space tests.
- class gymnasium.experimental.wrappers.OrderEnforcingV0(env: Env, disable_render_order_enforcing: bool = False)#
A wrapper that will produce an error if
step()
is called before an initialreset()
.Example
>>> from gymnasium.envs.classic_control import CartPoleEnv >>> env = CartPoleEnv() >>> env = OrderEnforcingV0(env) >>> env.step(0) ResetNeeded: Cannot call env.step() before calling env.reset() >>> env.render() ResetNeeded: Cannot call env.render() before calling env.reset() >>> env.reset() >>> env.render() >>> env.step(0)
A wrapper that will produce an error if
step()
is called before an initialreset()
.- Parameters:
env – The environment to wrap
disable_render_order_enforcing – If to disable render order enforcing
- class gymnasium.experimental.wrappers.RecordEpisodeStatisticsV0(env: Env[ObsType, ActType], buffer_length: int | None = 100, stats_key: str = 'episode')#
This wrapper will keep track of cumulative rewards and episode lengths.
At the end of an episode, the statistics of the episode will be added to
info
using the keyepisode
. If using a vectorized environment also the key_episode
is used which indicates whether the env at the respective index has the episode statistics.After the completion of an episode,
info
will look like this:>>> info = { ... ... ... "episode": { ... "r": "<cumulative reward>", ... "l": "<episode length>", ... "t": "<elapsed time since beginning of episode>" ... }, ... }
For a vectorized environments the output will be in the form of:
>>> infos = { ... ... ... "episode": { ... "r": "<array of cumulative reward>", ... "l": "<array of episode length>", ... "t": "<array of elapsed time since beginning of episode>" ... }, ... "_episode": "<boolean array of length num-envs>" ... }
Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via
wrapped_env.return_queue
andwrapped_env.length_queue
respectively.- Variables:
episode_reward_buffer – The cumulative rewards of the last
deque_size
-many episodesepisode_length_buffer – The lengths of the last
deque_size
-many episodes
This wrapper will keep track of cumulative rewards and episode lengths.
- Parameters:
env (Env) – The environment to apply the wrapper
buffer_length – The size of the buffers
return_queue
andlength_queue
stats_key – The info key for the episode statistics
Rendering Wrappers#
- class gymnasium.experimental.wrappers.RecordVideoV0(env: Env[ObsType, ActType])#
Record a video of an environment.
Wraps an environment to allow a modular transformation of the
step()
andreset()
methods.- Parameters:
env – The environment to wrap
- class gymnasium.experimental.wrappers.HumanRenderingV0(env)#
Performs human rendering for an environment that only supports “rgb_array”rendering.
This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify
"render_fps"
in the metadata of your environment.The
render_mode
of the wrapped environment must be either'rgb_array'
or'rgb_array_list'
.Example
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array") >>> wrapped = HumanRenderingV0(env) >>> wrapped.reset() # This will start rendering to the screen
The wrapper can also be applied directly when the environment is instantiated, simply by passing
render_mode="human"
tomake
. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e.render_mode
does not contain"human"
).Example
>>> env = gym.make("NoNativeRendering-v2", render_mode="human") # NoNativeRendering-v0 doesn't implement human-rendering natively >>> env.reset() # This will start rendering to the screen
- Warning: If the base environment uses
render_mode="rgb_array_list"
, its (i.e. the base environment’s) render method will always return an empty list:
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list") >>> wrapped = HumanRenderingV0(env) >>> wrapped.reset() >>> env.render() [] # env.render() will always return an empty list!
Initialize a
HumanRendering
instance.- Parameters:
env – The environment that is being wrapped
- Warning: If the base environment uses
- class gymnasium.experimental.wrappers.RenderCollectionV0(env: Env[ObsType, ActType], pop_frames: bool = True, reset_clean: bool = True)#
Collect rendered frames of an environment such
render
returns alist[RenderedFrame]
.Initialize a
RenderCollection
instance.- Parameters:
env – The environment that is being wrapped
pop_frames (bool) – If true, clear the collection frames after
meth:render
is called. Default value isTrue
.reset_clean (bool) – If true, clear the collection frames when
meth:reset
is called. Default value isTrue
.
Environment data conversion#
- class gymnasium.experimental.wrappers.JaxToNumpyV0(env: Env)#
Wraps a jax environment so that it can be interacted with through numpy arrays.
Actions must be provided as numpy arrays and observations will be returned as numpy arrays.
Notes
The Jax To Numpy and Numpy to Jax conversion does not guarantee a roundtrip (jax -> numpy -> jax) and vice versa. The reason for this is jax does not support non-array values, therefore numpy
int_32(5) -> DeviceArray([5], dtype=jnp.int23)
Wraps an environment such that the input and outputs are numpy arrays.
- Parameters:
env – the environment to wrap
- class gymnasium.experimental.wrappers.JaxToTorchV0(env: Env, device: Device | None = None)#
Wraps a jax-based environment so that it can be interacted with through PyTorch Tensors.
Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors.
Note
For
rendered
this is returned as a NumPy array not a pytorch Tensor.Wrapper class to change inputs and outputs of environment to PyTorch tensors.
- Parameters:
env – The Jax-based environment to wrap
device – The device the torch Tensors should be moved to
- class gymnasium.experimental.wrappers.NumpyToTorchV0(env: Env, device: Device | None = None)#
Wraps a numpy-based environment so that it can be interacted with through PyTorch Tensors.
Actions must be provided as PyTorch Tensors and observations will be returned as PyTorch Tensors.
Note
For
rendered
this is returned as a NumPy array not a pytorch Tensor.Wrapper class to change inputs and outputs of environment to PyTorch tensors.
- Parameters:
env – The Jax-based environment to wrap
device – The device the torch Tensors should be moved to