Utils#
Visualization#
- gymnasium.utils.play.play(env: Env, transpose: bool | None = True, fps: int | None = None, zoom: float | None = None, callback: Callable | None = None, keys_to_action: Dict[Tuple[str | int] | str, ActType] | None = None, seed: int | None = None, noop: ActType = 0)#
Allows one to play the game using keyboard.
- Parameters:
env – Environment to use for playing.
transpose – If this is
True
, the output of observation is transposed. Defaults toTrue
.fps – Maximum number of steps of the environment executed every second. If
None
(the default),env.metadata["render_fps""]
(or 30, if the environment does not specify “render_fps”) is used.zoom – Zoom the observation in,
zoom
amount, should be positive floatcallback – If a callback is provided, it will be executed after every step. It takes the following input: obs_t: observation before performing action obs_tp1: observation after performing action action: action that was executed rew: reward that was received terminated: whether the environment is terminated or not truncated: whether the environment is truncated or not info: debug info
keys_to_action –
Mapping from keys pressed to action performed. Different formats are supported: Key combinations can either be expressed as a tuple of unicode code points of the keys, as a tuple of characters, or as a string where each character of the string represents one key. For example if pressing ‘w’ and space at the same time is supposed to trigger action number 2 then
key_to_action
dict could look like this:>>> key_to_action = { ... # ... ... (ord('w'), ord(' ')): 2 ... # ... ... }
or like this:
>>> key_to_action = { ... # ... ... ("w", " "): 2 ... # ... ... }
or like this:
>>> key_to_action = { ... # ... ... "w ": 2 ... # ... ... }
If
None
, defaultkey_to_action
mapping for that environment is used, if provided.seed – Random seed used when resetting the environment. If None, no seed is used.
noop – The action used when no key input has been entered, or the entered key combination is unknown.
Example
>>> import gymnasium as gym >>> from gymnasium.utils.play import play >>> play(gym.make("CarRacing-v2", render_mode="rgb_array"), keys_to_action={ ... "w": np.array([0, 0.7, 0]), ... "a": np.array([-1, 0, 0]), ... "s": np.array([0, 0, 1]), ... "d": np.array([1, 0, 0]), ... "wa": np.array([-1, 0.7, 0]), ... "dw": np.array([1, 0.7, 0]), ... "ds": np.array([1, 0, 1]), ... "as": np.array([-1, 0, 1]), ... }, noop=np.array([0,0,0]))
Above code works also if the environment is wrapped, so it’s particularly useful in verifying that the frame-level preprocessing does not render the game unplayable.
If you wish to plot real time statistics as you play, you can use
gym.utils.play.PlayPlot
. Here’s a sample code for plotting the reward for last 150 steps.>>> import gymnasium as gym >>> from gymnasium.utils.play import PlayPlot, play >>> def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info): ... return [rew,] >>> plotter = PlayPlot(callback, 150, ["reward"]) >>> play(gym.make("CartPole-v1"), callback=plotter.callback)
- class gymnasium.utils.play.PlayPlot(callback: Callable, horizon_timesteps: int, plot_names: List[str])#
PlayPlot
provides the methodcallback()
which will pass its arguments along to that function and uses the returned values to update live plots of the metrics.Typically, this
callback()
will be used in conjunction withplay()
to see how the metrics evolve as you play:>>> plotter = PlayPlot(compute_metrics, horizon_timesteps=200, ... plot_names=["Immediate Rew.", "Cumulative Rew.", "Action Magnitude"]) >>> play(your_env, callback=plotter.callback)
- Parameters:
callback – Function that computes metrics from environment transitions
horizon_timesteps – The time horizon used for the live plots
plot_names – List of plot titles
- Raises:
DependencyNotInstalled – If matplotlib is not installed
- callback(obs_t: ObsType, obs_tp1: ObsType, action: ActType, rew: float, terminated: bool, truncated: bool, info: dict)#
The callback that calls the provided data callback and adds the data to the plots.
- Parameters:
obs_t – The observation at time step t
obs_tp1 – The observation at time step t+1
action – The action
rew – The reward
terminated – If the environment is terminated
truncated – If the environment is truncated
info – The information from the environment
- class gymnasium.utils.play.PlayableGame(env: Env, keys_to_action: Dict[Tuple[int, ...], int] | None = None, zoom: float | None = None)#
Wraps an environment allowing keyboard inputs to interact with the environment.
- Parameters:
env – The environment to play
keys_to_action – The dictionary of keyboard tuples and action value
zoom – If to zoom in on the environment render
Save Rendering Videos#
- gymnasium.utils.save_video.save_video(frames: list, video_folder: str, episode_trigger: Callable[[int], bool] | None = None, step_trigger: Callable[[int], bool] | None = None, video_length: int | None = None, name_prefix: str = 'rl-video', episode_index: int = 0, step_starting_index: int = 0, **kwargs)#
Save videos from rendering frames.
This function extract video from a list of render frame episodes.
- Parameters:
frames (List[RenderFrame]) – A list of frames to compose the video.
video_folder (str) – The folder where the recordings will be stored
episode_trigger – Function that accepts an integer and returns
True
iff a recording should be started at this episodestep_trigger – Function that accepts an integer and returns
True
iff a recording should be started at this stepvideo_length (int) – The length of recorded episodes. If it isn’t specified, the entire episode is recorded. Otherwise, snippets of the specified length are captured.
name_prefix (str) – Will be prepended to the filename of the recordings.
episode_index (int) – The index of the current episode.
step_starting_index (int) – The step index of the first frame.
**kwargs – The kwargs that will be passed to moviepy’s ImageSequenceClip. You need to specify either fps or duration.
Example
>>> import gymnasium as gym >>> from gymnasium.utils.save_video import save_video >>> env = gym.make("FrozenLake-v1", render_mode="rgb_array_list") >>> _ = env.reset() >>> step_starting_index = 0 >>> episode_index = 0 >>> for step_index in range(199): ... action = env.action_space.sample() ... _, _, terminated, truncated, _ = env.step(action) ... ... if terminated or truncated: ... save_video( ... env.render(), ... "videos", ... fps=env.metadata["render_fps"], ... step_starting_index=step_starting_index, ... episode_index=episode_index ... ) ... step_starting_index = step_index + 1 ... episode_index += 1 ... env.reset() >>> env.close()
- gymnasium.utils.save_video.capped_cubic_video_schedule(episode_id: int) bool #
The default episode trigger.
This function will trigger recordings at the episode indices 0, 1, 4, 8, 27, …, \(k^3\), …, 729, 1000, 2000, 3000, …
- Parameters:
episode_id – The episode number
- Returns:
If to apply a video schedule number
Old to New Step API Compatibility#
- gymnasium.utils.step_api_compatibility.step_api_compatibility(step_returns: Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, dict | list], output_truncation_bool: bool = True, is_vector_env: bool = False) Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, dict | list] #
Function to transform step returns to the API specified by output_truncation_bool bool.
Done (old) step API refers to step() method returning (observation, reward, done, info) Terminated Truncated (new) step API refers to step() method returning (observation, reward, terminated, truncated, info) (Refer to docs for details on the API change)
- Parameters:
step_returns (tuple) – Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)
output_truncation_bool (bool) – Whether the output should return two booleans (new API) or one (old) (True by default)
is_vector_env (bool) – Whether the step_returns are from a vector environment
- Returns:
step_returns (tuple) – Depending on output_truncation_bool bool, it can return (obs, rew, done, info) or (obs, rew, terminated, truncated, info)
Example
- This function can be used to ensure compatibility in step interfaces with conflicting API. Eg. if env is written in old API,
wrapper is written in new API, and the final step output is desired to be in old API.
>>> import gymnasium as gym >>> env = gym.make("CartPole-v0") >>> _ = env.reset() >>> obs, rewards, done, info = step_api_compatibility(env.step(0), output_truncation_bool=False) >>> obs, rewards, terminated, truncated, info = step_api_compatibility(env.step(0), output_truncation_bool=True)
>>> vec_env = gym.vector.make("CartPole-v0") >>> _ = vec_env.reset() >>> obs, rewards, dones, infos = step_api_compatibility(vec_env.step([0]), is_vector_env=True, output_truncation_bool=False) >>> obs, rewards, terminated, truncated, info = step_api_compatibility(vec_env.step([0]), is_vector_env=True, output_truncation_bool=True)
- gymnasium.utils.step_api_compatibility.convert_to_terminated_truncated_step_api(step_returns: Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, bool | ndarray, dict | list], is_vector_env=False) Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, bool | ndarray, dict | list] #
Function to transform step returns to new step API irrespective of input API.
- Parameters:
step_returns (tuple) – Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)
is_vector_env (bool) – Whether the step_returns are from a vector environment
- gymnasium.utils.step_api_compatibility.convert_to_done_step_api(step_returns: Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, dict | list], is_vector_env: bool = False) Tuple[ObsType | ndarray, SupportsFloat | ndarray, bool | ndarray, dict | list] #
Function to transform step returns to old step API irrespective of input API.
- Parameters:
step_returns (tuple) – Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)
is_vector_env (bool) – Whether the step_returns are from a vector environment
Seeding#
- gymnasium.utils.seeding.np_random(seed: int | None = None) Tuple[Generator, Any] #
Generates a random number generator from the seed and returns the Generator and seed.
- Parameters:
seed – The seed used to create the generator
- Returns:
The generator and resulting seed
- Raises:
Error – Seed must be a non-negative integer or omitted
Environment Checking#
- gymnasium.utils.env_checker.check_env(env: Env, warn: bool | None = None, skip_render_check: bool = False)#
Check that an environment follows Gym API.
This is an invasive function that calls the environment’s reset and step.
This is particularly useful when using a custom environment. Please take a look at https://gymnasium.farama.org/content/environment_creation/ for more information about the API.
- Parameters:
env – The Gym environment that will be checked
warn – Ignored
skip_render_check – Whether to skip the checks for the render method. True by default (useful for the CI)