Misc Wrappers#
- class gymnasium.wrappers.AtariPreprocessing(env: Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = False, grayscale_obs: bool = True, grayscale_newaxis: bool = False, scale_obs: bool = False)#
Atari 2600 preprocessing wrapper.
This class follows the guidelines in Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents”.
Specifically, the following preprocess stages applies to the atari environment:
Noop Reset: Obtains the initial state by taking a random number of no-ops on reset, default max 30 no-ops.
Frame skipping: The number of frames skipped between steps, 4 by default
Max-pooling: Pools over the most recent two observations from the frame skips
- Termination signal when a life is lost: When the agent losses a life during the environment, then the environment is terminated.
Turned off by default. Not recommended by Machado et al. (2018).
Resize to a square image: Resizes the atari environment original observation shape from 210x180 to 84x84 by default
Grayscale observation: If the observation is colour or greyscale, by default, greyscale.
Scale observation: If to scale the observation between [0, 1) or [0, 255), by default, not scaled.
Wrapper for Atari 2600 preprocessing.
- Parameters:
env (Env) – The environment to apply the preprocessing
noop_max (int) – For No-op reset, the max number no-ops actions are taken at reset, to turn off, set to 0.
frame_skip (int) – The number of frames between new observation the agents observations effecting the frequency at which the agent experiences the game.
screen_size (int) – resize Atari frame
terminal_on_life_loss (bool) – if True, then
step()
returns terminated=True whenever a life is lost.grayscale_obs (bool) – if True, then gray scale observation is returned, otherwise, RGB observation is returned.
grayscale_newaxis (bool) – if True and grayscale_obs=True, then a channel axis is added to grayscale observations to make them 3-dimensional.
scale_obs (bool) – if True, then observation normalized in range [0,1) is returned. It also limits memory optimization benefits of FrameStack Wrapper.
- Raises:
DependencyNotInstalled – opencv-python package not installed
ValueError – Disable frame-skipping in the original env
- class gymnasium.wrappers.AutoResetWrapper(env: Env)#
A class for providing an automatic reset functionality for gymnasium environments when calling
self.step()
.When calling step causes
Env.step()
to return terminated=True or truncated=True,Env.reset()
is called, and the return format ofself.step()
is as follows:(new_obs, final_reward, final_terminated, final_truncated, info)
with new step API and(new_obs, final_reward, final_done, info)
with the old step API.new_obs
is the first observation after callingself.env.reset()
final_reward
is the reward after callingself.env.step()
, prior to callingself.env.reset()
.final_terminated
is the terminated value before callingself.env.reset()
.final_truncated
is the truncated value before callingself.env.reset()
. Both final_terminated and final_truncated cannot be False.info
is a dict containing all the keys from the info dict returned by the call toself.env.reset()
, with an additional key “final_observation” containing the observation returned by the last call toself.env.step()
and “final_info” containing the info dict returned by the last call toself.env.step()
.
- Warning: When using this wrapper to collect rollouts, note that when
Env.step()
returns terminated or truncated, a new observation from after calling
Env.reset()
is returned byEnv.step()
alongside the final reward, terminated and truncated state from the previous episode. If you need the final state from the previous episode, you need to retrieve it via the “final_observation” key in the info dict. Make sure you know what you’re doing if you use this wrapper!
A class for providing an automatic reset functionality for gymnasium environments when calling
self.step()
.- Parameters:
env (gym.Env) – The environment to apply the wrapper
- class gymnasium.wrappers.EnvCompatibility(old_env: LegacyEnv, render_mode: Optional[str] = None)#
A wrapper which can transform an environment from the old API to the new API.
Old step API refers to step() method returning (observation, reward, done, info), and reset() only retuning the observation. New step API refers to step() method returning (observation, reward, terminated, truncated, info) and reset() returning (observation, info). (Refer to docs for details on the API change)
Known limitations: - Environments that use self.np_random might not work as expected.
A wrapper which converts old-style envs to valid modern envs.
Some information may be lost in the conversion, so we recommend updating your environment.
- Parameters:
old_env (LegacyEnv) – the env to wrap, implemented with the old API
render_mode (str) – the render mode to use when rendering the environment, passed automatically to env.render
- class gymnasium.wrappers.StepAPICompatibility(env: Env, output_truncation_bool: bool = True)#
A wrapper which can transform an environment from new step API to old and vice-versa.
Old step API refers to step() method returning (observation, reward, done, info) New step API refers to step() method returning (observation, reward, terminated, truncated, info) (Refer to docs for details on the API change)
- Parameters:
env (gym.Env) – the env to wrap. Can be in old or new API
output_truncation_bool (bool) – Apply to convert environment to use new step API that returns two bool. (True by default)
Examples
>>> env = gym.make("CartPole-v1") >>> env # wrapper not applied by default, set to new API <TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>> >>> env = gym.make("CartPole-v1", apply_api_compatibility=True) # set to old API <StepAPICompatibility<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>> >>> env = StepAPICompatibility(CustomEnv(), output_truncation_bool=False) # manually using wrapper on unregistered envs
A wrapper which can transform an environment from new step API to old and vice-versa.
- Parameters:
env (gym.Env) – the env to wrap. Can be in old or new API
output_truncation_bool (bool) – Whether the wrapper’s step method outputs two booleans (new API) or one boolean (old API)
- class gymnasium.wrappers.PassiveEnvChecker(env)#
A passive environment checker wrapper that surrounds the step, reset and render functions to check they follow the gymnasium API.
Initialises the wrapper with the environments, run the observation and action space tests.
- class gymnasium.wrappers.HumanRendering(env)#
Performs human rendering for an environment that only supports “rgb_array”rendering.
This wrapper is particularly useful when you have implemented an environment that can produce RGB images but haven’t implemented any code to render the images to the screen. If you want to use this wrapper with your environments, remember to specify
"render_fps"
in the metadata of your environment.The
render_mode
of the wrapped environment must be either'rgb_array'
or'rgb_array_list'
.Example
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array") >>> wrapped = HumanRendering(env) >>> wrapped.reset() # This will start rendering to the screen
The wrapper can also be applied directly when the environment is instantiated, simply by passing
render_mode="human"
tomake
. The wrapper will only be applied if the environment does not implement human-rendering natively (i.e.render_mode
does not contain"human"
).Example
>>> env = gym.make("NoNativeRendering-v2", render_mode="human") # NoNativeRendering-v0 doesn't implement human-rendering natively >>> env.reset() # This will start rendering to the screen
- Warning: If the base environment uses
render_mode="rgb_array_list"
, its (i.e. the base environment’s) render method will always return an empty list:
>>> env = gym.make("LunarLander-v2", render_mode="rgb_array_list") >>> wrapped = HumanRendering(env) >>> wrapped.reset() >>> env.render() [] # env.render() will always return an empty list!
Initialize a
HumanRendering
instance.- Parameters:
env – The environment that is being wrapped
- Warning: If the base environment uses
- class gymnasium.wrappers.OrderEnforcing(env: Env, disable_render_order_enforcing: bool = False)#
A wrapper that will produce an error if
step()
is called before an initialreset()
.Example
>>> from gymnasium.envs.classic_control import CartPoleEnv >>> env = CartPoleEnv() >>> env = OrderEnforcing(env) >>> env.step(0) ResetNeeded: Cannot call env.step() before calling env.reset() >>> env.render() ResetNeeded: Cannot call env.render() before calling env.reset() >>> env.reset() >>> env.render() >>> env.step(0)
A wrapper that will produce an error if
step()
is called before an initialreset()
.- Parameters:
env – The environment to wrap
disable_render_order_enforcing – If to disable render order enforcing
- class gymnasium.wrappers.RecordEpisodeStatistics(env: Env, deque_size: int = 100)#
This wrapper will keep track of cumulative rewards and episode lengths.
At the end of an episode, the statistics of the episode will be added to
info
using the keyepisode
. If using a vectorized environment also the key_episode
is used which indicates whether the env at the respective index has the episode statistics.After the completion of an episode,
info
will look like this:>>> info = { ... ... ... "episode": { ... "r": "<cumulative reward>", ... "l": "<episode length>", ... "t": "<elapsed time since beginning of episode>" ... }, ... }
For a vectorized environments the output will be in the form of:
>>> infos = { ... ... ... "episode": { ... "r": "<array of cumulative reward>", ... "l": "<array of episode length>", ... "t": "<array of elapsed time since beginning of episode>" ... }, ... "_episode": "<boolean array of length num-envs>" ... }
Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via
wrapped_env.return_queue
andwrapped_env.length_queue
respectively.- Variables:
return_queue – The cumulative rewards of the last
deque_size
-many episodeslength_queue – The lengths of the last
deque_size
-many episodes
This wrapper will keep track of cumulative rewards and episode lengths.
- Parameters:
env (Env) – The environment to apply the wrapper
deque_size – The size of the buffers
return_queue
andlength_queue
- class gymnasium.wrappers.RecordVideo(env: Env, video_folder: str, episode_trigger: Optional[Callable[[int], bool]] = None, step_trigger: Optional[Callable[[int], bool]] = None, video_length: int = 0, name_prefix: str = 'rl-video', disable_logger: bool = False)#
This wrapper records videos of rollouts.
Usually, you only want to record episodes intermittently, say every hundredth episode. To do this, you can specify either
episode_trigger
orstep_trigger
(not both). They should be functions returning a boolean that indicates whether a recording should be started at the current episode or step, respectively. If neitherepisode_trigger
norstep_trigger
is passed, a defaultepisode_trigger
will be employed. By default, the recording will be stopped once a terminated or truncated signal has been emitted by the environment. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value forvideo_length
.Wrapper records videos of rollouts.
- Parameters:
env – The environment that will be wrapped
video_folder (str) – The folder where the recordings will be stored
episode_trigger – Function that accepts an integer and returns
True
iff a recording should be started at this episodestep_trigger – Function that accepts an integer and returns
True
iff a recording should be started at this stepvideo_length (int) – The length of recorded episodes. If 0, entire episodes are recorded. Otherwise, snippets of the specified length are captured
name_prefix (str) – Will be prepended to the filename of the recordings
disable_logger (bool) – Whether to disable moviepy logger or not.
- class gymnasium.wrappers.RenderCollection(env: Env, pop_frames: bool = True, reset_clean: bool = True)#
Save collection of render frames.
Initialize a
RenderCollection
instance.- Parameters:
env – The environment that is being wrapped
pop_frames (bool) – If true, clear the collection frames after .render() is called.
True. (Default value is) –
reset_clean (bool) – If true, clear the collection frames when .reset() is called.
True. –
- class gymnasium.wrappers.TimeLimit(env: Env, max_episode_steps: Optional[int] = None)#
This wrapper will issue a truncated signal if a maximum number of timesteps is exceeded.
If a truncation is not defined inside the environment itself, this is the only place that the truncation signal is issued. Critically, this is different from the terminated signal that originates from the underlying environment as part of the MDP.
Example
>>> from gymnasium.envs.classic_control import CartPoleEnv >>> from gymnasium.wrappers import TimeLimit >>> env = CartPoleEnv() >>> env = TimeLimit(env, max_episode_steps=1000)
Initializes the
TimeLimit
wrapper with an environment and the number of steps after which truncation will occur.- Parameters:
env – The environment to apply the wrapper
max_episode_steps – An optional max episode steps (if
Ǹone
,env.spec.max_episode_steps
is used)
- class gymnasium.wrappers.VectorListInfo(env)#
Converts infos of vectorized environments from dict to List[dict].
This wrapper converts the info format of a vector environment from a dictionary to a list of dictionaries. This wrapper is intended to be used around vectorized environments. If using other wrappers that perform operation on info like RecordEpisodeStatistics this need to be the outermost wrapper.
i.e. VectorListInfo(RecordEpisodeStatistics(envs))
Example:
>>> # actual >>> { ... "k": np.array[0., 0., 0.5, 0.3], ... "_k": np.array[False, False, True, True] ... } >>> # classic >>> [{}, {}, {k: 0.5}, {k: 0.3}]
This wrapper will convert the info into the list format.
- Parameters:
env (Env) – The environment to apply the wrapper