Vector#

Gymnasium.vector.VectorEnv#

class gymnasium.vector.VectorEnv(num_envs: int, observation_space: Space, action_space: Space)#

Base class for vectorized environments to run multiple independent copies of the same environment in parallel.

Vector environments can provide a linear speed-up in the steps taken per second through sampling multiple sub-environments at the same time. To prevent terminated environments waiting until all sub-environments have terminated or truncated, the vector environments autoreset sub-environments after they terminate or truncated. As a result, the final step’s observation and info are overwritten by the reset’s observation and info. Therefore, the observation and info for the final step of a sub-environment is stored in the info parameter, using “final_observation” and “final_info” respectively. See step() for more information.

The vector environments batch observations, rewards, terminations, truncations and info for each parallel environment. In addition, step() expects to receive a batch of actions for each parallel environment.

Gymnasium contains two types of Vector environments: AsyncVectorEnv and SyncVectorEnv.

The Vector Environments have the additional attributes for users to understand the implementation

num_envs - The number of sub-environment in the vector environment
observation_space - The batched observation space of the vector environment
single_observation_space - The observation space of a single sub-environment
action_space - The batched action space of the vector environment
single_action_space - The action space of a single sub-environment

Note

The info parameter of reset() and step() was originally implemented before OpenAI Gym v25 was a list of dictionary for each sub-environment. However, this was modified in OpenAI Gym v25+ and in Gymnasium to a dictionary with a NumPy array for each key. To use the old info style using the VectorListInfo.

Note

To render the sub-environments, use call() with “render” arguments. Remember to set the render_modes for all the sub-environments during initialization.

Note

All parallel environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.

Parameters:

num_envs – Number of environments in the vectorized environment.
observation_space – Observation space of a single environment.
action_space – Action space of a single environment.

Methods#

VectorEnv.reset(*, seed: int | List[int] | None = None, options: dict | None = None)#

Reset all parallel environments and return a batch of initial observations and info.

Parameters:

seed – The environment reset seeds
options – If to return the options

Returns:

A batch of observations and info from the vectorized environment.

Example

>>> import gymnasium as gym
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset(seed=42)
(array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32), {})

VectorEnv.step(actions) → Tuple[Any, ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], dict]#

Take an action for each parallel environment.

Parameters:: actions – element of action_space Batch of actions.
Returns:: Batch of (observations, rewards, terminations, truncations, infos)

Note

As the vector environments autoreset for a terminating and truncating sub-environments, the returned observation and info is not the final step’s observation or info which is instead stored in info as “final_observation” and “final_info”.

Example

>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> _ = envs.reset(seed=42)
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
>>> observations
array([[ 0.02727336,  0.18847767,  0.03625453, -0.26141977],
       [ 0.01431748, -0.24002443, -0.04731862,  0.3110827 ],
       [-0.03822722,  0.1710671 , -0.00848456, -0.2487226 ]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> termination
array([False, False, False])
>>> termination
array([False, False, False])
>>> infos
{}

VectorEnv.close(**kwargs)#

Close all parallel environments and release resources.

It also closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is generic for both synchronous and asynchronous vectorized environments.

Note

This will be automatically called when garbage collected or program exited.

Parameters:: **kwargs – Keyword arguments passed to close_extras()

Attributes#

action_space#

The (batched) action space. The input actions of step must be valid elements of action_space.:

>>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3)
>>> envs.action_space
MultiDiscrete([2 2 2])

observation_space#

The (batched) observation space. The observations returned by reset and step are valid elements of observation_space.:

>>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3)
>>> envs.observation_space
Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)

single_action_space#

The action space of an environment copy.:

>>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_action_space
Discrete(2)

single_observation_space#

The observation space of an environment copy.:

>>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_action_space
Box([-4.8 ...], [4.8 ...], (4,), float32)

Making Vector Environments#

gymnasium.vector.make(id: str, num_envs: int = 1, asynchronous: bool = True, wrappers: Callable[[Env], Env] | List[Callable[[Env], Env]] | None = None, disable_env_checker: bool | None = None, **kwargs) → VectorEnv#

Create a vectorized environment from multiple copies of an environment, from its id.

Parameters:

id – The environment ID. This must be a valid ID from the registry.
num_envs – Number of copies of the environment.
asynchronous – If True, wraps the environments in an AsyncVectorEnv (which uses multiprocessing to run the environments in parallel). If False, wraps the environments in a SyncVectorEnv.
wrappers – If not None, then apply the wrappers to each internal environment during creation.
disable_env_checker – If to run the env checker for the first environment only. None will default to the environment spec disable_env_checker parameter (that is by default False), otherwise will run according to this argument (True = not run, False = run)
**kwargs – Keywords arguments applied during gym.make

Returns:

The vectorized environment.

Example

>>> import gymnasium as gym
>>> env = gym.vector.make('CartPole-v1', num_envs=3)
>>> env.reset(seed=42)
(array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32), {})

Async Vector Env#

class gymnasium.vector.AsyncVectorEnv(env_fns: Sequence[Callable[[], Env]], observation_space: Space | None = None, action_space: Space | None = None, shared_memory: bool = True, copy: bool = True, context: str | None = None, daemon: bool = True, worker: Callable | None = None)#

Vectorized environment that runs multiple environments in parallel.

It uses multiprocessing processes, and pipes for communication.

Example

>>> import gymnasium as gym
>>> env = gym.vector.AsyncVectorEnv([
...     lambda: gym.make("Pendulum-v1", g=9.81),
...     lambda: gym.make("Pendulum-v1", g=1.62)
... ])
>>> env.reset(seed=42)
(array([[-0.14995256,  0.9886932 , -0.12224312],
       [ 0.5760367 ,  0.8174238 , -0.91244936]], dtype=float32), {})

Parameters:

env_fns – Functions that create the environments.
observation_space – Observation space of a single environment. If None, then the observation space of the first environment is taken.
action_space – Action space of a single environment. If None, then the action space of the first environment is taken.
shared_memory – If True, then the observations from the worker processes are communicated back through shared variables. This can improve the efficiency if the observations are large (e.g. images).
copy – If True, then the reset() and step() methods return a copy of the observations.
context – Context for `multiprocessing`_. If None, then the default context is used.
daemon – If True, then subprocesses have daemon flag turned on; that is, they will quit if the head process quits. However, daemon=True prevents subprocesses to spawn children, so for some environments you may want to have it set to False.
worker – If set, then use that worker in a subprocess instead of a default one. Can be useful to override some inner vector env logic, for instance, how resets on termination or truncation are handled.

Warnings: worker is an advanced mode option. It provides a high degree of flexibility and a high chance: to shoot yourself in the foot; thus, if you are writing your own worker, it is recommended to start from the code for _worker (or _worker_shared_memory) method, and add changes.

Raises:

RuntimeError – If the observation space of some sub-environment does not match observation_space (or, by default, the observation space of the first sub-environment).
ValueError – If observation_space is a custom space (i.e. not a default space in Gym, such as gymnasium.spaces.Box, gymnasium.spaces.Discrete, or gymnasium.spaces.Dict) and shared_memory is True.

Sync Vector Env#

class gymnasium.vector.SyncVectorEnv(env_fns: Iterable[Callable[[], Env]], observation_space: Space | None = None, action_space: Space | None = None, copy: bool = True)#

Vectorized environment that serially runs multiple environments.

Example

>>> import gymnasium as gym
>>> env = gym.vector.SyncVectorEnv([
...     lambda: gym.make("Pendulum-v1", g=9.81),
...     lambda: gym.make("Pendulum-v1", g=1.62)
... ])
>>> env.reset(seed=42)
(array([[-0.14995256,  0.9886932 , -0.12224312],
       [ 0.5760367 ,  0.8174238 , -0.91244936]], dtype=float32), {})

Parameters:

env_fns – iterable of callable functions that create the environments.
observation_space – Observation space of a single environment. If None, then the observation space of the first environment is taken.
action_space – Action space of a single environment. If None, then the action space of the first environment is taken.
copy – If True, then the reset() and step() methods return a copy of the observations.

Raises:

RuntimeError – If the observation space of some sub-environment does not match observation_space (or, by default, the observation space of the first sub-environment).