Wrappers¶
- class gymnasium.vector.VectorWrapper(env: VectorEnv)[source]¶
Wraps the vectorized environment to allow a modular transformation.
This class is the base class for all wrappers for vectorized environments. The subclass could override some methods to change the behavior of the original vectorized environment without touching the original code.
Note
Don’t forget to call
super().__init__(env)
if the subclass overrides__init__()
.- Parameters:
env – The environment to wrap
- step(actions: ActType) tuple[ObsType, ArrayType, ArrayType, ArrayType, dict[str, Any]] [source]¶
Step through all environments using the actions returning the batched data.
- reset(*, seed: int | list[int] | None = None, options: dict[str, Any] | None = None) tuple[ObsType, dict[str, Any]] [source]¶
Reset all environment using seed and options.
- class gymnasium.vector.VectorObservationWrapper(env: VectorEnv)[source]¶
Wraps the vectorized environment to allow a modular transformation of the observation.
Equivalent to
gymnasium.ObservationWrapper
for vectorized environments.- Parameters:
env – Vector environment.
- class gymnasium.vector.VectorActionWrapper(env: VectorEnv)[source]¶
Wraps the vectorized environment to allow a modular transformation of the actions.
Equivalent of
gymnasium.ActionWrapper
for vectorized environments.- Parameters:
env – The environment to wrap
- class gymnasium.vector.VectorRewardWrapper(env: VectorEnv)[source]¶
Wraps the vectorized environment to allow a modular transformation of the reward.
Equivalent of
gymnasium.RewardWrapper
for vectorized environments.- Parameters:
env – The environment to wrap
Vector Only wrappers¶
- class gymnasium.wrappers.vector.DictInfoToList(env: VectorEnv)[source]¶
Converts infos of vectorized environments from
dict
toList[dict]
.This wrapper converts the info format of a vector environment from a dictionary to a list of dictionaries. This wrapper is intended to be used around vectorized environments. If using other wrappers that perform operation on info like RecordEpisodeStatistics this need to be the outermost wrapper.
i.e.
DictInfoToList(RecordEpisodeStatistics(vector_env))
Example
>>> import numpy as np >>> dict_info = { ... "k": np.array([0., 0., 0.5, 0.3]), ... "_k": np.array([False, False, True, True]) ... } ... >>> list_info = [{}, {}, {"k": 0.5}, {"k": 0.3}]
- Example for vector environments:
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3) >>> obs, info = envs.reset(seed=123) >>> info {} >>> envs = DictInfoToList(envs) >>> obs, info = envs.reset(seed=123) >>> info [{}, {}, {}]
- Another example for vector environments:
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("HalfCheetah-v4", num_envs=2) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> _, _, _, _, infos = envs.step(envs.action_space.sample()) >>> infos {'x_position': array([0.03332211, 0.10172355]), '_x_position': array([ True, True]), 'x_velocity': array([-0.06296527, 0.89345848]), '_x_velocity': array([ True, True]), 'reward_run': array([-0.06296527, 0.89345848]), '_reward_run': array([ True, True]), 'reward_ctrl': array([-0.24503504, -0.21944423], dtype=float32), '_reward_ctrl': array([ True, True])} >>> envs = DictInfoToList(envs) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> _, _, _, _, infos = envs.step(envs.action_space.sample()) >>> infos [{'x_position': np.float64(0.0333221090036294), 'x_velocity': np.float64(-0.06296527291998574), 'reward_run': np.float64(-0.06296527291998574), 'reward_ctrl': np.float32(-0.24503504)}, {'x_position': np.float64(0.10172354684460168), 'x_velocity': np.float64(0.8934584807363618), 'reward_run': np.float64(0.8934584807363618), 'reward_ctrl': np.float32(-0.21944423)}]
- Change logs:
v0.24.0 - Initially added as
VectorListInfo
v1.0.0 - Renamed to
DictInfoToList
- Parameters:
env (Env) – The environment to apply the wrapper
- class gymnasium.wrappers.vector.VectorizeTransformObservation(env: VectorEnv, wrapper: type[TransformObservation], **kwargs: Any)[source]¶
Vectorizes a single-agent transform observation wrapper for vector environments.
Most of the lambda observation wrappers for single agent environments have vectorized implementations, it is advised that users simply use those instead via importing from gymnasium.wrappers.vector…. The following example illustrate use-cases where a custom lambda observation wrapper is required.
- Example - The normal observation:
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> envs.close() >>> obs array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]], dtype=float32)
- Example - Applying a custom lambda observation wrapper that duplicates the observation from the environment
>>> import numpy as np >>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> from gymnasium.wrappers import TransformObservation >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> old_space = envs.single_observation_space >>> new_space = Box(low=np.array([old_space.low, old_space.low]), high=np.array([old_space.high, old_space.high])) >>> envs = VectorizeTransformObservation(envs, wrapper=TransformObservation, func=lambda x: np.array([x, x]), observation_space=new_space) >>> obs, info = envs.reset(seed=123) >>> envs.close() >>> obs array([[[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.01823519, -0.0446179 , -0.02796401, -0.03156282]], [[ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598]], [[ 0.03517495, -0.000635 , -0.01098382, -0.03203924], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]]], dtype=float32)
- Parameters:
env – The vector environment to wrap.
wrapper – The wrapper to vectorize
**kwargs – Keyword argument for the wrapper
- class gymnasium.wrappers.vector.VectorizeTransformAction(env: VectorEnv, wrapper: type[TransformAction], **kwargs: Any)[source]¶
Vectorizes a single-agent transform action wrapper for vector environments.
- Example - Without action transformation:
>>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> obs array([[-4.6343064e-01, 9.8971417e-05], [-4.4488689e-01, -1.9375233e-03], [-4.3118435e-01, -1.5342437e-03]], dtype=float32)
- Example - Adding a transform that applies a ReLU to the action:
>>> import gymnasium as gym >>> from gymnasium.wrappers import TransformAction >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = VectorizeTransformAction(envs, wrapper=TransformAction, func=lambda x: (x > 0.0) * x, action_space=envs.single_action_space) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> obs array([[-4.6343064e-01, 9.8971417e-05], [-4.4354835e-01, -5.9898634e-04], [-4.3034542e-01, -6.9532328e-04]], dtype=float32)
- Parameters:
env – The vector environment to wrap
wrapper – The wrapper to vectorize
**kwargs – Arguments for the LambdaAction wrapper
- class gymnasium.wrappers.vector.VectorizeTransformReward(env: VectorEnv, wrapper: type[TransformReward], **kwargs: Any)[source]¶
Vectorizes a single-agent transform reward wrapper for vector environments.
- An example such that applies a ReLU to the reward:
>>> import gymnasium as gym >>> from gymnasium.wrappers import TransformReward >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = VectorizeTransformReward(envs, wrapper=TransformReward, func=lambda x: (x > 0.0) * x) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> rew array([-0., -0., -0.])
- Parameters:
env – The vector environment to wrap.
wrapper – The wrapper to vectorize
**kwargs – Keyword argument for the wrapper
Vectorized Common wrappers¶
- class gymnasium.wrappers.vector.RecordEpisodeStatistics(env: VectorEnv, buffer_length: int = 100, stats_key: str = 'episode')[source]¶
This wrapper will keep track of cumulative rewards and episode lengths.
At the end of any episode within the vectorized env, the statistics of the episode will be added to
info
using the keyepisode
, and the_episode
key is used to indicate the environment index which has a terminated or truncated episode.>>> infos = { ... ... ... "episode": { ... "r": "<array of cumulative reward for each done sub-environment>", ... "l": "<array of episode length for each done sub-environment>", ... "t": "<array of elapsed time since beginning of episode for each done sub-environment>" ... }, ... "_episode": "<boolean array of length num-envs>" ... }
Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via
wrapped_env.return_queue
andwrapped_env.length_queue
respectively.- Variables:
return_queue – The cumulative rewards of the last
deque_size
-many episodeslength_queue – The lengths of the last
deque_size
-many episodes
Example
>>> from pprint import pprint >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3) >>> envs = RecordEpisodeStatistics(envs) >>> obs, info = envs.reset(123) >>> _ = envs.action_space.seed(123) >>> end = False >>> while not end: ... obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) ... end = term.any() or trunc.any() ... >>> envs.close() >>> pprint(info) {'_episode': array([ True, False, False]), '_final_info': array([ True, False, False]), '_final_observation': array([ True, False, False]), 'episode': {'l': array([11, 0, 0], dtype=int32), 'r': array([11., 0., 0.], dtype=float32), 't': array([0.007812, 0. , 0. ], dtype=float32)}, 'final_info': array([{}, None, None], dtype=object), 'final_observation': array([array([ 0.11448676, 0.9416149 , -0.20946532, -1.7619033 ], dtype=float32), None, None], dtype=object)}
- Parameters:
env (Env) – The environment to apply the wrapper
buffer_length – The size of the buffers
return_queue
,length_queue
andtime_queue
stats_key – The info key to save the data
Implemented Observation wrappers¶
- class gymnasium.wrappers.vector.TransformObservation(env: VectorEnv, func: Callable[[ObsType], Any], observation_space: Space | None = None)[source]¶
Transforms an observation via a function provided to the wrapper.
This function allows the manual specification of the vector-observation function as well as the single-observation function. This is desirable when, for example, it is possible to process vector observations in parallel or via other more optimized methods. Otherwise, the
VectorizeTransformObservation
should be used instead, where onlysingle_func
needs to be defined.- Example - Without observation transformation:
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]], dtype=float32) >>> envs.close()
- Example - With observation transformation:
>>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> def scale_and_shift(obs): ... return (obs - 1.0) * 2.0 ... >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> new_obs_space = Box(low=envs.observation_space.low, high=envs.observation_space.high) >>> envs = TransformObservation(envs, func=scale_and_shift, observation_space=new_obs_space) >>> obs, info = envs.reset(seed=123) >>> obs array([[-1.9635296, -2.0892358, -2.055928 , -2.0631256], [-1.9429494, -1.9428282, -1.9061728, -1.9503881], [-1.9296501, -2.00127 , -2.0219676, -2.0640786]], dtype=float32) >>> envs.close()
- Parameters:
env – The vector environment to wrap
func – A function that will transform the vector observation. If this transformed observation is outside the observation space of
env.observation_space
then provide anobservation_space
.observation_space – The observation spaces of the wrapper, if None, then it is assumed the same as
env.observation_space
.
- class gymnasium.wrappers.vector.FilterObservation(env: VectorEnv, filter_keys: Sequence[str | int])[source]¶
Vector wrapper for filtering dict or tuple observation spaces.
- Example - Create a vectorized environment with a Dict space to demonstrate how to filter keys:
>>> import numpy as np >>> import gymnasium as gym >>> from gymnasium.spaces import Dict, Box >>> from gymnasium.wrappers import TransformObservation >>> from gymnasium.wrappers.vector import VectorizeTransformObservation, FilterObservation >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> make_dict = lambda x: {"obs": x, "junk": np.array([0.0])} >>> new_space = Dict({"obs": envs.single_observation_space, "junk": Box(low=-1.0, high=1.0)}) >>> envs = VectorizeTransformObservation(env=envs, wrapper=TransformObservation, func=make_dict, observation_space=new_space) >>> envs = FilterObservation(envs, ["obs"]) >>> obs, info = envs.reset(seed=123) >>> envs.close() >>> obs {'obs': array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]], dtype=float32)}
- Parameters:
env – The vector environment to wrap
filter_keys – The subspaces to be included, use a list of strings or integers for
Dict
andTuple
spaces respectivesly
- class gymnasium.wrappers.vector.FlattenObservation(env: VectorEnv)[source]¶
Observation wrapper that flattens the observation.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = FlattenObservation(envs) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 27648) >>> envs.close()
- Parameters:
env – The vector environment to wrap
- class gymnasium.wrappers.vector.GrayscaleObservation(env: VectorEnv, keep_dim: bool = False)[source]¶
Observation wrapper that converts an RGB image to grayscale.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = GrayscaleObservation(envs) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96) >>> envs.close()
- Parameters:
env – The vector environment to wrap
keep_dim – If to keep the channel in the observation, if
True
,obs.shape == 3
elseobs.shape == 2
- class gymnasium.wrappers.vector.ResizeObservation(env: VectorEnv, shape: tuple[int, ...])[source]¶
Resizes image observations using OpenCV to shape.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = ResizeObservation(envs, shape=(28, 28)) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 28, 28, 3) >>> envs.close()
- Parameters:
env – The vector environment to wrap
shape – The resized observation shape
- class gymnasium.wrappers.vector.ReshapeObservation(env: VectorEnv, shape: int | tuple[int, ...])[source]¶
Reshapes array based observations to shapes.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = ReshapeObservation(envs, shape=(9216, 3)) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 9216, 3) >>> envs.close()
- Parameters:
env – The vector environment to wrap
shape – The reshaped observation space
- class gymnasium.wrappers.vector.RescaleObservation(env: VectorEnv, min_obs: np.floating | np.integer | np.ndarray, max_obs: np.floating | np.integer | np.ndarray)[source]¶
Linearly rescales observation to between a minimum and maximum value.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("MountainCar-v0", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.min() np.float32(-0.46352962) >>> obs.max() np.float32(0.0) >>> envs = RescaleObservation(envs, min_obs=-5.0, max_obs=5.0) >>> obs, info = envs.reset(seed=123) >>> obs.min() np.float32(-0.90849805) >>> obs.max() np.float32(0.0) >>> envs.close()
- Parameters:
env – The vector environment to wrap
min_obs – The new minimum observation bound
max_obs – The new maximum observation bound
- class gymnasium.wrappers.vector.DtypeObservation(env: VectorEnv, dtype: Any)[source]¶
Observation wrapper for transforming the dtype of an observation.
Example
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.dtype dtype('float32') >>> envs = DtypeObservation(envs, dtype=np.float64) >>> obs, info = envs.reset(seed=123) >>> obs.dtype dtype('float64') >>> envs.close()
- Parameters:
env – The vector environment to wrap
dtype – The new dtype of the observation
- class gymnasium.wrappers.vector.NormalizeObservation(env: VectorEnv, epsilon: float = 1e-8)[source]¶
This wrapper will normalize observations s.t. each coordinate is centered with unit variance.
The property _update_running_mean allows to freeze/continue the running mean calculation of the observation statistics. If True (default), the RunningMeanStd will get updated every step and reset call. If False, the calculated statistics are used but not updated anymore; this may be used during evaluation.
Note
The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently.
- Example without the normalize reward wrapper:
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> for _ in range(100): ... obs, *_ = envs.step(envs.action_space.sample()) >>> np.mean(obs) np.float32(0.024251968) >>> np.std(obs) np.float32(0.62259156) >>> envs.close()
- Example with the normalize reward wrapper:
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> envs = NormalizeObservation(envs) >>> obs, info = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> for _ in range(100): ... obs, *_ = envs.step(envs.action_space.sample()) >>> np.mean(obs) np.float32(-0.2359734) >>> np.std(obs) np.float32(1.1938739) >>> envs.close()
- Parameters:
env (Env) – The environment to apply the wrapper
epsilon – A stability parameter that is used when scaling the observations.
Implemented Action wrappers¶
- class gymnasium.wrappers.vector.TransformAction(env: VectorEnv, func: Callable[[ActType], Any], action_space: Space | None = None)[source]¶
Transforms an action via a function provided to the wrapper.
The function
func
will be applied to all vector actions. If the observations fromfunc
are outside the bounds of theenv
’s action space, provide anaction_space
which specifies the action space for the vectorized environment.- Example - Without action transformation:
>>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) ... >>> envs.close() >>> obs array([[-0.46553135, -0.00142543], [-0.498371 , -0.00715587], [-0.46515748, -0.00624371]], dtype=float32)
- Example - With action transformation:
>>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> def shrink_action(act): ... return act * 0.3 ... >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> new_action_space = Box(low=shrink_action(envs.action_space.low), high=shrink_action(envs.action_space.high)) >>> envs = TransformAction(env=envs, func=shrink_action, action_space=new_action_space) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) ... >>> envs.close() >>> obs array([[-0.48468155, -0.00372536], [-0.47599354, -0.00545912], [-0.46543318, -0.00615723]], dtype=float32)
- Parameters:
env – The vector environment to wrap
func – A function that will transform an action. If this transformed action is outside the action space of
env.action_space
then provide anaction_space
.action_space – The action spaces of the wrapper, if None, then it is assumed the same as
env.action_space
.
- class gymnasium.wrappers.vector.ClipAction(env: VectorEnv)[source]¶
Clip the continuous action within the valid
Box
observation space bound.- Example - Passing an out-of-bounds action to the environment to be clipped.
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = ClipAction(envs) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(np.array([5.0, -5.0, 2.0])) >>> envs.close() >>> obs array([[-0.4624777 , 0.00105192], [-0.44504836, -0.00209899], [-0.42884544, 0.00080468]], dtype=float32)
- Parameters:
env – The vector environment to wrap
- class gymnasium.wrappers.vector.RescaleAction(env: VectorEnv, min_action: float | int | np.ndarray, max_action: float | int | np.ndarray)[source]¶
Affinely rescales the continuous action space of the environment to the range [min_action, max_action].
- Example - Without action scaling:
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1))) ... >>> envs.close() >>> obs array([[-0.44799727, 0.00266526], [-0.4351738 , 0.00133522], [-0.42683297, 0.00048403]], dtype=float32)
- Example - With action scaling:
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = RescaleAction(envs, 0.0, 1.0) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1))) ... >>> envs.close() >>> obs array([[-0.48657528, -0.00395268], [-0.47377947, -0.00529102], [-0.46546045, -0.00614867]], dtype=float32)
- Parameters:
env (Env) – The vector environment to wrap
min_action (float, int or np.ndarray) – The min values for each action. This may be a numpy array or a scalar.
max_action (float, int or np.ndarray) – The max values for each action. This may be a numpy array or a scalar.
Implemented Reward wrappers¶
- class gymnasium.wrappers.vector.TransformReward(env: VectorEnv, func: Callable[[ArrayType], ArrayType])[source]¶
A reward wrapper that allows a custom function to modify the step reward.
- Example with reward transformation:
>>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> def scale_and_shift(rew): ... return (rew - 1.0) * 2.0 ... >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = TransformReward(env=envs, func=scale_and_shift) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> obs array([[-4.6343064e-01, 9.8971417e-05], [-4.4488689e-01, -1.9375233e-03], [-4.3118435e-01, -1.5342437e-03]], dtype=float32)
- Parameters:
env (Env) – The vector environment to wrap
func – (Callable): The function to apply to reward
- class gymnasium.wrappers.vector.ClipReward(env: VectorEnv, min_reward: float | np.ndarray | None = None, max_reward: float | np.ndarray | None = None)[source]¶
A wrapper that clips the rewards for an environment between an upper and lower bound.
- Example with clipped rewards:
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = ClipReward(envs, 0.0, 2.0) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1))) ... >>> envs.close() >>> rew array([0., 0., 0.])
- Parameters:
env – The vector environment to wrap
min_reward – The min reward for each step
max_reward – the max reward for each step
- class gymnasium.wrappers.vector.NormalizeReward(env: VectorEnv, gamma: float = 0.99, epsilon: float = 1e-8)[source]¶
This wrapper will scale rewards s.t. the discounted returns have a mean of 0 and std of 1.
In a nutshell, the rewards are divided through by the standard deviation of a rolling discounted sum of the reward. The exponential moving average will have variance \((1 - \gamma)^2\).
The property _update_running_mean allows to freeze/continue the running mean calculation of the reward statistics. If True (default), the RunningMeanStd will get updated every time self.normalize() is called. If False, the calculated statistics are used but not updated anymore; this may be used during evaluation.
- Important note:
Contrary to what the name suggests, this wrapper does not normalize the rewards to have a mean of 0 and a standard deviation of 1. Instead, it scales the rewards such that discounted returns have approximately unit variance. See [Engstrom et al.](https://openreview.net/forum?id=r1etN1rtPB) on “reward scaling” for more information.
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
- Example without the normalize reward wrapper:
>>> import gymnasium as gym >>> import numpy as np >>> envs = gym.make_vec("MountainCarContinuous-v0", 3) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> episode_rewards = [] >>> for _ in range(100): ... observation, reward, *_ = envs.step(envs.action_space.sample()) ... episode_rewards.append(reward) ... >>> envs.close() >>> np.mean(episode_rewards) np.float64(-0.03359492141887935) >>> np.std(episode_rewards) np.float64(0.029028230434438706)
- Example with the normalize reward wrapper:
>>> import gymnasium as gym >>> import numpy as np >>> envs = gym.make_vec("MountainCarContinuous-v0", 3) >>> envs = NormalizeReward(envs) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> episode_rewards = [] >>> for _ in range(100): ... observation, reward, *_ = envs.step(envs.action_space.sample()) ... episode_rewards.append(reward) ... >>> envs.close() >>> np.mean(episode_rewards) np.float64(-0.1598639586606745) >>> np.std(episode_rewards) np.float64(0.27800309628058434)
- Parameters:
env (env) – The environment to apply the wrapper
epsilon (float) – A stability parameter
gamma (float) – The discount factor that is used in the exponential moving average.
Implemented Data Conversion wrappers¶
- class gymnasium.wrappers.vector.JaxToNumpy(env: VectorEnv)[source]¶
Wraps a jax vector environment so that it can be interacted with through numpy arrays.
Notes
A vectorized version of
gymnasium.wrappers.JaxToNumpy
Actions must be provided as numpy arrays and observations, rewards, terminations and truncations will be returned as numpy arrays.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("JaxEnv-vx", 3) >>> envs = JaxToNumpy(envs)
- Parameters:
env – the vector jax environment to wrap
- class gymnasium.wrappers.vector.JaxToTorch(env: VectorEnv, device: Device | None = None)[source]¶
Wraps a Jax-based vector environment so that it can be interacted with through PyTorch Tensors.
Actions must be provided as PyTorch Tensors and observations, rewards, terminations and truncations will be returned as PyTorch Tensors.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("JaxEnv-vx", 3) >>> envs = JaxToTorch(envs)
- Parameters:
env – The Jax-based vector environment to wrap
device – The device the torch Tensors should be moved to
- class gymnasium.wrappers.vector.NumpyToTorch(env: VectorEnv, device: Device | None = None)[source]¶
Wraps a numpy-based environment so that it can be interacted with through PyTorch Tensors.
Example
>>> import torch >>> import gymnasium as gym >>> from gymnasium.wrappers.vector import NumpyToTorch >>> envs = gym.make_vec("CartPole-v1", 3) >>> envs = NumpyToTorch(envs) >>> obs, _ = envs.reset(seed=123) >>> type(obs) <class 'torch.Tensor'> >>> action = torch.tensor(envs.action_space.sample()) >>> obs, reward, terminated, truncated, info = envs.step(action) >>> envs.close() >>> type(obs) <class 'torch.Tensor'> >>> type(reward) <class 'torch.Tensor'> >>> type(terminated) <class 'torch.Tensor'> >>> type(truncated) <class 'torch.Tensor'>
- Parameters:
env – The Jax-based vector environment to wrap
device – The device the torch Tensors should be moved to