Vectorize¶

Gymnasium.vector.VectorEnv¶

class gymnasium.vector.VectorEnv[source]¶

Base class for vectorized environments to run multiple independent copies of the same environment in parallel.

Vector environments can provide a linear speed-up in the steps taken per second through sampling multiple sub-environments at the same time. Gymnasium contains two generalised Vector environments: AsyncVectorEnv and SyncVectorEnv along with several custom vector environment implementations. For reset() and step() batches observations, rewards, terminations, truncations and info for each sub-environment, see the example below. For the rewards, terminations, and truncations, the data is packaged into a NumPy array of shape (num_envs,). For observations (and actions, the batching process is dependent on the type of observation (and action) space, and generally optimised for neural network input/outputs. For info, the data is kept as a dictionary such that a key will give the data for all sub-environment.

For creating environments, make_vec() is a vector environment equivalent to make() for easily creating vector environments that contains several unique arguments for modifying environment qualities, number of environment, vectorizer type, vectorizer arguments.

Note

The info parameter of reset() and step() was originally implemented before v0.25 as a list of dictionary for each sub-environment. However, this was modified in v0.25+ to be a dictionary with a NumPy array for each key. To use the old info style, utilise the DictInfoToList wrapper.

Examples

>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync", wrappers=(gym.wrappers.TimeAwareObservation,))
>>> envs = gym.wrappers.vector.ClipReward(envs, min_reward=0.2, max_reward=0.8)
>>> envs
<ClipReward, SyncVectorEnv(CartPole-v1, num_envs=3)>
>>> envs.num_envs
3
>>> envs.action_space
MultiDiscrete([2 2 2])
>>> envs.observation_space
Box([[-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38
   0.00000000e+00]
 [-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38
   0.00000000e+00]
 [-4.80000019e+00 -3.40282347e+38 -4.18879032e-01 -3.40282347e+38
   0.00000000e+00]], [[4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38
  5.00000000e+02]
 [4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38
  5.00000000e+02]
 [4.80000019e+00 3.40282347e+38 4.18879032e-01 3.40282347e+38
  5.00000000e+02]], (3, 5), float64)
>>> observations, infos = envs.reset(seed=123)
>>> observations
array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282,  0.        ],
       [ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598,  0.        ],
       [ 0.03517495, -0.000635  , -0.01098382, -0.03203924,  0.        ]])
>>> infos
{}
>>> _ = envs.action_space.seed(123)
>>> actions = envs.action_space.sample()
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
>>> observations
array([[ 0.01734283,  0.15089367, -0.02859527, -0.33293587,  1.        ],
       [ 0.02909703, -0.16717631,  0.04740972,  0.3319138 ,  1.        ],
       [ 0.03516225, -0.19559774, -0.01162461,  0.25715804,  1.        ]])
>>> rewards
array([0.8, 0.8, 0.8])
>>> terminations
array([False, False, False])
>>> truncations
array([False, False, False])
>>> infos
{}
>>> envs.close()

To avoid having to wait for all sub-environments to terminated before resetting, implementations will autoreset sub-environments on episode end (terminated or truncated is True). As a result, when adding observations to a replay buffer, this requires a knowning where the observation (and info) for each sub-environment are the first observation from an autoreset. We recommend using an additional variable to store this information.

The Vector Environments have the additional attributes for users to understand the implementation

num_envs - The number of sub-environment in the vector environment
observation_space - The batched observation space of the vector environment
single_observation_space - The observation space of a single sub-environment
action_space - The batched action space of the vector environment
single_action_space - The action space of a single sub-environment

Methods¶

VectorEnv.step(actions: ActType) → tuple[ObsType, ArrayType, ArrayType, ArrayType, dict[str, Any]][source]¶

Take an action for each parallel environment.

Parameters:: actions – Batch of actions with the action_space shape.
Returns:: Batch of (observations, rewards, terminations, truncations, infos)

Note

As the vector environments autoreset for a terminating and truncating sub-environments, this will occur on the next step after terminated or truncated is True.

Example

>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> _ = envs.reset(seed=42)
>>> actions = np.array([1, 0, 1], dtype=np.int32)
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
>>> observations
array([[ 0.02727336,  0.18847767,  0.03625453, -0.26141977],
       [ 0.01431748, -0.24002443, -0.04731862,  0.3110827 ],
       [-0.03822722,  0.1710671 , -0.00848456, -0.2487226 ]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> terminations
array([False, False, False])
>>> terminations
array([False, False, False])
>>> infos
{}

VectorEnv.reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[ObsType, dict[str, Any]][source]¶

Reset all parallel environments and return a batch of initial observations and info.

Parameters:

seed – The environment reset seed
options – If to return the options

Returns:

A batch of observations and info from the vectorized environment.

Example

>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> observations, infos = envs.reset(seed=42)
>>> observations
array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32)
>>> infos
{}

VectorEnv.render() → tuple[RenderFrame, ...] | None[source]¶

Returns the rendered frames from the parallel environments.

Returns:: A tuple of rendered frames from the parallel environments

VectorEnv.close(**kwargs: Any)[source]¶

Close all parallel environments and release resources.

It also closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is generic for both synchronous and asynchronous vectorized environments.

Note

This will be automatically called when garbage collected or program exited.

Parameters:: **kwargs – Keyword arguments passed to close_extras()

Attributes¶

VectorEnv.num_envs: int¶: The number of sub-environments in the vector environment.

VectorEnv.action_space: gym.Space¶: The (batched) action space. The input actions of step must be valid elements of action_space.

VectorEnv.observation_space: gym.Space¶: The (batched) observation space. The observations returned by reset and step are valid elements of observation_space.

VectorEnv.single_action_space: gym.Space¶: The action space of a sub-environment.

VectorEnv.single_observation_space: gym.Space¶: The observation space of a sub-environment.

VectorEnv.spec: EnvSpec | None = None¶: The EnvSpec of the environment normally set during gymnasium.make_vec()

VectorEnv.metadata: dict[str, Any] = {}¶: The metadata of the environment containing rendering modes, rendering fps, etc

VectorEnv.render_mode: str | None = None¶: The render mode of the environment which should follow similar specifications to Env.render_mode.

VectorEnv.closed: bool = False¶: If the vector environment has been closed already.

Additional Methods¶

property VectorEnv.unwrapped¶: Return the base environment.

property VectorEnv.np_random: Generator¶

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of `np.random.Generator`

property VectorEnv.np_random_seed: int | None¶

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: int – the seed of the current np_random or -1, if the seed of the rng is unknown

Making Vector Environments¶

To create vector environments, gymnasium provides gymnasium.make_vec() as an equivalent function to gymnasium.make().