Gymnasium Release Notes¶
v1.0.0¶
Released on 2024-10-08 - GitHub - PyPI
v1.0.0 release notes
Over the last few years, the volunteer team behind Gym and Gymnasium has worked to fix bugs, improve the documentation, add new features, and change the API where appropriate so that the benefits outweigh the costs. This is the complete release of v1.0.0
, which will be the end of this road to change the project's central API (Env
, Space
, VectorEnv
). In addition, the release has included over 200 PRs since 0.29.1
, with many bug fixes, new features, and improved documentation. So, thank you to all the volunteers for their hard work that has made this possible. For the rest of these release notes, we include sections of core API changes, ending with the additional new features, bug fixes, deprecation and documentation changes included.
Finally, we have published a paper on Gymnasium, discussing its overall design decisions and more at https://arxiv.org/abs/2407.17032, which can be cited using the following:
@misc{towers2024gymnasium,
title={Gymnasium: A Standard Interface for Reinforcement Learning Environments},
author={Mark Towers and Ariel Kwiatkowski and Jordan Terry and John U. Balis and Gianluca De Cola and Tristan Deleu and Manuel Goulão and Andreas Kallinteris and Markus Krimmel and Arjun KG and Rodrigo Perez-Vicente and Andrea Pierré and Sander Schulhoff and Jun Jet Tai and Hannah Tan and Omar G. Younis},
year={2024},
eprint={2407.17032},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2407.17032},
}
Removing The Plugin System
Within Gym v0.23+ and Gymnasium v0.26 to v0.29, an undocumented feature for registering external environments behind the scenes has been removed. For users of Atari (ALE), Minigrid or HighwayEnv, then users could previously use the following code:
import gymnasium as gym
env = gym.make("ALE/Pong-v5")
Despite Atari never being imported (i.e., import ale_py
), users can still create an Atari environment. This feature has been removed in v1.0.0
, which will require users to update to
import gymnasium as gym
import ale_py
gym.register_envs(ale_py) # optional, helpful for IDEs or pre-commit
env = gym.make("ALE/Pong-v5")
Alternatively, users can use the following structure, module_name:env_id, ' so that the module is imported first before the environment is created. e.g.,
ale_py:ALE/Pong-v5`.
import gymnasium as gym
env = gym.make("ale_py:ALE/Pong-v5")
To help users with IDEs (e.g., VSCode, PyCharm), when importing modules to register environments (e.g., import ale_py
) this can cause the IDE (and pre-commit isort / black / flake8) to believe that the import is pointless and should be removed. Therefore, we have introduced gymnasium.register_envs
as a no-op function (the function literally does nothing) to make the IDE believe that something is happening and the import statement is required.
Vector Environments
To increase the sample speed of an environment, vectorizing is one of the easiest ways to sample multiple instances of the same environment simultaneously. Gym and Gymnasium provide the VectorEnv
as a base class for this, but one of its issues has been that it inherited Env
. This can cause particular issues with type checking (the return type of step
is different for Env
and VectorEnv
), testing the environment type (isinstance(env, Env)
can be true for vector environments despite the two acting differently) and finally wrappers (some Gym and Gymnasium wrappers supported Vector environments, but there are no clear or consistent API for determining which do or don't). Therefore, we have separated out Env
and VectorEnv
to not inherit from each other.
In implementing the new separate VectorEnv
class, we have tried to minimize the difference between code using Env
and VectorEnv
along with making it more generic in places. The class contains the same attributes and methods as Env
in addition to the attributes num_envs: int
, single_action_space: gymnasium.Space
and single_observation_space: gymnasium.Space
. Further, we have removed several functions from VectorEnv
that are not needed for all vector implementations: step_async
, step_wait
, reset_async
, reset_wait
, call_async
and call_wait
. This change now allows users to write their own custom vector environments, v1.0.0 includes an example vector cartpole environment that runs thousands of times faster written solely with NumPy than using Gymnasium's Sync vector environment.
To allow users to create vectorized environments easily, we provide gymnasium.make_vec
as a vectorized equivalent of gymnasium.make
. As there are multiple different vectorization options ("sync", "async", and a custom class referred to as "vector_entry_point"), the argument vectorization_mode
selects how the environment is vectorized. This defaults to None
such that if the environment has a vector entry point for a custom vector environment implementation, this will be utilized first (currently, Cartpole is the only environment with a vector entry point built into Gymnasium). Otherwise, the synchronous vectorizer is used (previously, the Gym and Gymnasium vector.make
used asynchronous vectorizer as default). For more information, see the function docstring. We are excited to see other projects utilize this option to make creating their environments easier.
env = gym.make("CartPole-v1")
env = gym.wrappers.ClipReward(env, min_reward=-1, max_reward=3)
envs = gym.make_vec("CartPole-v1", num_envs=3)
envs = gym.wrappers.vector.ClipReward(envs, min_reward=-1, max_reward=3)
Due to this split of Env
and VectorEnv
, there are now Env
only wrappers and VectorEnv
only wrappers in gymnasium.wrappers
and gymnasium.wrappers.vector
respectively. Furthermore, we updated the names of the base vector wrappers from VectorEnvWrapper
to VectorWrapper
and added VectorObservationWrapper
, VectorRewardWrapper
and VectorActionWrapper
classes. See the vector wrapper page for new information.
To increase the efficiency of vector environments, autoreset is a common feature that allows sub-environments to reset without requiring all sub-environments to finish before resetting them all. Previously in Gym and Gymnasium, auto-resetting was done on the same step as the environment episode ends, such that the final observation and info would be stored in the step's info, i.e., info["final_observation"]
and info[“final_info”]
and standard obs and info containing the sub-environment's reset observation and info. Thus, accurately sampling observations from a vector environment required the following code (note the need to extract the infos["next_obs"][j]
if the sub-environment was terminated or truncated). Additionally, for on-policy algorithms that use rollout would require an additional forward pass to compute the correct next observation (this is often not done as an optimization assuming that environments only terminate, not truncate).
replay_buffer = []
obs, _ = envs.reset()
for _ in range(total_timesteps):
next_obs, rewards, terminations, truncations, infos = envs.step(envs.action_space.sample())
for j in range(envs.num_envs):
if not (terminations[j] or truncations[j]):
replay_buffer.append((
obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
))
else:
replay_buffer.append((
obs[j], rewards[j], terminations[j], truncations[j], infos["next_obs"][j]
))
obs = next_obs
However, over time, the development team has recognized the inefficiency of this approach (primarily due to the extensive use of a Python dictionary) and the annoyance of having to extract the final observation to train agents correctly, for example. Therefore, in v1.0.0, we are modifying autoreset to align with specialized vector-only projects like EnvPool and SampleFactory where the sub-environment's doesn't reset until the next step. As a result, the following changes are required when sampling:
replay_buffer = []
obs, _ = envs.reset()
autoreset = np.zeros(envs.num_envs)
for _ in range(total_timesteps):
next_obs, rewards, terminations, truncations, _ = envs.step(envs.action_space.sample())
for j in range(envs.num_envs):
if not autoreset[j]:
replay_buffer.append((
obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
))
obs = next_obs
autoreset = np.logical_or(terminations, truncations)
For on-policy rollout, to account for the autoreset requires masking the error for the first observation in a new episode (done[t+1]
) to prevent computing the error between the last and first observations of episodes.
Finally, we have improved AsyncVectorEnv.set_attr
and SyncVectorEnv.set_attr
functions to use the Wrapper.set_wrapper_attr
to allow users to set variables anywhere in the environment stack if it already exists. Previously, this was not possible and users could only modify the variable in the "top" wrapper on the environment stack, importantly not the actual environment itself.
Wrappers
Previously, some wrappers could support both environment and vector environments, however, this was not standardized, and was unclear which wrapper did and didn't support vector environments. For v1.0.0, with separating Env
and VectorEnv
to no longer inherit from each other (read more in the vector section), the wrappers in gymnasium.wrappers
will only support standard environments and wrappers in gymnasium.wrappers.vector
contains the provided specialized vector wrappers (most but not all wrappers are supported, please raise a feature request if you require it).
In v0.29, we deprecated the Wrapper.__getattr__
function to be replaced by Wrapper.get_wrapper_attr
, providing access to variables anywhere in the environment stack. In v1.0.0, we have added Wrapper.set_wrapper_attr
as an equivalent function for setting a variable anywhere in the environment stack if it already exists; otherwise the variable is assigned to the top wrapper.
Most significantly, we have removed, renamed, and added several wrappers listed below.
- Removed wrappers
monitoring.VideoRecorder
- The replacement wrapper isRecordVideo
StepAPICompatibility
- We expect all Gymnasium environments to use the terminated / truncated step API, therefore, users shouldn't need theStepAPICompatibility
wrapper. Shimmy includes a compatibility environment to convert gym-api environments for gymnasium.
- Renamed wrappers (We wished to make wrappers consistent in naming. Therefore, we have removed "Wrapper" from all wrappers and included "Observation", "Action" and "Reward" within wrapper names where appropriate)
AutoResetWrapper
->Autoreset
FrameStack
->FrameStackObservation
PixelObservationWrapper
->AddRenderObservation
- Moved wrappers (All vector wrappers are in
gymnasium.wrappers.vector
)VectorListInfo
->vector.DictInfoToList
- Added wrappers
DelayObservation
- Adds a delay to the next observation and rewardDtypeObservation
- Modifies the dtype of an environment's observation spaceMaxAndSkipObservation
- Will skipn
observations and will max over the last 2 observations, inspired by the Atari environment heuristic for other environmentsStickyAction
- Random repeats actions with a probability for a step returning the final observation and sum of rewards over steps. Inspired by Atari environment heuristicsJaxToNumpy
- Converts a Jax-based environment to use Numpy-based input and output data forreset
,step
, etcJaxToTorch
- Converts a Jax-based environment to use PyTorch-based input and output data forreset
,step
, etcNumpyToTorch
- Converts a Numpy-based environment to use PyTorch-based input and output data forreset
,step
, etc
For all wrappers, we have added example code documentation and a changelog to help future researchers understand any changes made. See the following page for an example.
Functional Environments
One of the substantial advantages of Gymnasium's Env
is it generally requires minimal information about the underlying environment specifications; however, this can make applying such environments to planning, search algorithms, and theoretical investigations more difficult. We are proposing FuncEnv
as an alternative definition to Env
which is closer to a Markov Decision Process definition, exposing more functions to the user, including the observation, reward, and termination functions along with the environment's raw state as a single object.
from typing import Any
import gymnasium as gym
from gymnasium.functional import StateType, ObsType, ActType, RewardType, TerminalType, Params
class ExampleFuncEnv(gym.functional.FuncEnv):
def initial(self, rng: Any, params: Params | None = None) -> StateType:
...
def transition(self, state: StateType, action: ActType, rng: Any, params: Params | None = None) -> StateType:
...
def observation(self, state: StateType, rng: Any, params: Params | None = None) -> ObsType:
...
def reward(
self, state: StateType, action: ActType, next_state: StateType, rng: Any, params: Params | None = None
) -> RewardType:
...
def terminal(self, state: StateType, rng: Any, params: Params | None = None) -> TerminalType:
...
FuncEnv
requires that initial
and transition
functions return a new state given its inputs as a partial implementation of Env.step
and Env.reset
. As a result, users can sample (and save) the next state for a range of inputs to use with planning, searching, etc. Given a state, observation
, reward
, and terminal
provide users explicit definitions to understand how each can affect the environment's output.
Collecting Seeding Values
It was possible to seed with both environments and spaces with None
to use a random initial seed value, however it wouldn't be possible to know what these initial seed values were. We have addressed this for Space.seed
and reset.seed
in #1033 and #889. Additionally, for Space.seed
, we have changed the return type to be specialized for each space such that the following code will work for all spaces.
seeded_values = space.seed(None)
initial_samples = [space.sample() for _ in range(10)]
reseed_values = space.seed(seeded_values)
reseed_samples = [space.sample() for _ in range(10)]
assert seeded_values == reseed_values
assert initial_samples == reseed_samples
Additionally, for environments, we have added a new np_random_seed
attribute that will store the most recent np_random
seed value from reset(seed=seed)
.
Environment Version Changes
-
It was discovered recently that the MuJoCo-based Pusher was not compatible with
mujoco>= 3
as the model's density for the block that the agent had to push was lighter than air. This obviously began to cause issues for users withmujoco>= 3
and Pusher. Therefore, we are disabled thev4
environment withmujoco>= 3
and updated to the model in MuJoCov5
that produces more expected behavior likev4
andmujoco< 3
(#1019). -
New v5 MuJoCo environments as a follow-up to v4 environments added two years ago, fixing consistencies, adding new features and updating the documentation (#572). Additionally, we have decided to mark the mujoco-py based (v2 and v3) environments as deprecated and plan to remove them from Gymnasium in future (#926).
-
Lunar Lander version increased from v2 to v3 due to two bug fixes. The first fixes the determinism of the environment such that the world object was not completely destroyed on reset causing non-determinism in particular cases (#979). Second, the wind generation (by default turned off) was not randomly generated by each reset, therefore, we have updated this to gain statistical independence between episodes (#959).
-
CarRacing version increased from v2 to v3 to change how the environment ends such that when the agent completes the track then the environment will terminate not truncate.
-
We have remove
pip install "gymnasium[accept-rom-license]"
asale-py>=0.9
now comes packaged with the roms meaning that users don't need to install the atari roms separately withautoroms
.
Additional Bug Fixes
spaces.Box
would allow low and high values outside the dtype's range, which could result in some very strange edge cases that were very difficult to detect by @pseudo-rnd-thoughts (#774)- Limit the cython version for
gymnasium[mujoco-py]
due tocython==3
issues by @pseudo-rnd-thoughts (#616) - Fix mujoco rendering with custom width values by @logan-dunbar (#634)
- Fix environment checker to correctly report infinite bounds by @chrisyeh96 (#708)
- Fix type hint for
register(kwargs)
from**kwargs
tokwargs: dict | None = None
by @younik (#788) - Fix registration in
AsyncVectorEnv
for custom environments by @RedTachyon (#810) - Remove
mujoco-py
import error for v4+ MuJoCo environments by @MischaPanch
(#934) - Fix reading shared memory for
Tuple
andDict
spaces (#941) - Fix
Multidiscrete.from_jsonable
on windows (#932) - Remove
play
rendering normalization (#956) - Fix non-used device argument in
to_torch
conversion by @mantasu (#1107) - Fix torch to numpy conversion when on GPU by @mantasu (#1109)
Additional new features
- Added Python 3.12 and NumPy 2.0 support by @RedTachyon in #1094
- Add support in MuJoCo human rendering to change the size of the viewing window by @logan-dunbar (#635)
- Add more control in MuJoCo rendering over offscreen dimensions and scene geometries by @guyazran (#731)
- Add stack trace reporting to
AsyncVectorEnv
by @pseudo-rnd-thoughts in #1119 - Add support to handle
NamedTuples
inJaxToNumpy
,JaxToTorch
andNumpyToTorch
by @RogerJL (#789) and @pseudo-rnd-thoughts (#811) - Add
padding_type
parameter toFrameSkipObservation
to select the padding observation by @jamartinh (#830) - Add render check to
check_environments_match
by @Kallinteris-Andreas (#748) - Add a new
OneOf
space that provides exclusive unions of spaces by @RedTachyon and @pseudo-rnd-thoughts (#812) - Update
Dict.sample
to use standard Python dicts rather thanOrderedDict
due to dropping Python 3.7 support by @pseudo-rnd-thoughts (#977) - Jax environment return jax data rather than numpy data by @RedTachyon and @pseudo-rnd-thoughts (#817)
- Add
wrappers.vector.HumanRendering
and remove human rendering fromCartPoleVectorEnv
by @pseudo-rnd-thoughts and @TimSchneider42 (#1013) - Add more helpful error messages if users use a mixture of Gym and Gymnasium by @pseudo-rnd-thoughts (#957)
- Add
sutton_barto_reward
argument forCartPole
that changes the reward function to not return 1 on terminating states by @Kallinteris-Andreas (#958) - Add
visual_options
rendering argument for MuJoCo environments by @Kallinteris-Andreas (#965) - Add
exact
argument toutlis.env_checker.data_equivilance
by @Kallinteris-Andreas (#924) - Update
wrapper.NormalizeObservation
observation space and change observation tofloat32
by @pseudo-rnd-thoughts (#978) - Catch exception during
env.spec
if kwarg is unpickleable by @pseudo-rnd-thoughts (#982) - Improving ImportError for Box2D by @turbotimon (#1009)
- Add an option for a tuple of (int, int) screen-size in AtariPreprocessing wrapper by @pseudo-rnd-thoughts (#1105)
- Add
is_slippery
option for cliffwalking environment by @CloseChoice (#1087) - Update
RescaleAction
andRescaleObservation
to supportnp.inf
bounds by @TimSchneider42 (#1095) - Update determinism check for
env.reset(seed=42); env.reset()
by @qgallouedec (#1086) - Refactor mujoco to remove
BaseMujocoEnv
class by @Kallinteris-Andreas (#1075)
Deprecation
- Remove unnecessary error classes in error.py by @pseudo-rnd-thoughts (#801)
- Stop exporting MuJoCo v2 environment classes from
gymnasium.envs.mujoco
by @Kallinteris-Andreas (#827) - Remove deprecation warning from PlayPlot by @pseudo-rnd-thoughts (#800)
Documentation changes
- Updated the custom environment tutorial for v1.0.0 by @kir0ul (#709)
- Add swig to installation instructions for Box2D by @btjanaka (#683)
- Add tutorial Load custom quadruped robot environments using
Gymnasium/MuJoCo/Ant-v5
framework by @Kallinteris-Andreas (#838) - Add a third-party tutorial page to list tutorials written and hosted on other websites by @pseudo-rnd-thoughts (#867)
- Add more introductory pages by @pseudo-rnd-thoughts (#791)
- Add figures for each MuJoCo environment representing their action space by @Kallinteris-Andreas (#762)
- Fix the documentation on blackjack's starting state by @pseudo-rnd-thoughts (#893)
- Update Taxi environment documentation to clarify starting state definition by @britojr in #1120
- Fix the documentation on Frozenlake and Cliffwalking's position by @PierreCounathe (#695)
- Update the classic control environment's
__init__
andreset
arguments by @pseudo-rnd-thoughts (#898)
Full Changelog: v0.29.1...v1.0.0
v1.0.0a2: v1.0.0 alpha 2¶
Released on 2024-05-21 - GitHub - PyPI
This is our second alpha version which we hope to be the last before the full Gymnasium v1.0.0 release. We summarise the key changes, bug fixes and new features added in this alpha version.
Key Changes
Atari environments
ale-py that provides the Atari environments has been updated in v0.9.0 to use Gymnasium as the API backend. Furthermore, the pip install contains the ROMs so all that should be necessary for installing Atari will be pip install “gymnasium[atari]”
(as a result, gymnasium[accept-rom-license]
has been removed). A reminder that for Gymnasium v1.0 to register the external environments (e.g., ale-py
), you will be required to import ale_py
before creating any of the Atari environments.
Collecting seeding values
It was possible to seed with both environments and spaces with None
to use a random initial seed value however it wouldn’t be possible to know what these initial seed values were. We have addressed for this Space.seed
and reset.seed
in #1033 and #889. For Space.seed
, we have changed the return type to be specialised for each space such that the following code will work for all spaces.
seeded_values = space.seed(None)
initial_samples = [space.sample() for _ in range(10)]
reseed_values = space.seed(seeded_values)
reseed_samples = [space.sample() for _ in range(10)]
assert seeded_values == reseed_values
assert initial_samples == reseed_samples
Additionally, for environments, we have added a new np_random_seed
attribute that will store the most recent np_random
seed value from reset(seed=seed)
.
Environment Version changes
- It was discovered recently that the mujoco-based pusher was not compatible with MuJoCo
>= 3
due to bug fixes that found the model density for a block that the agent had to push was the density of air. This obviously began to cause issues for users with MuJoCo v3+ and Pusher. Therefore, we are disabled thev4
environment with MuJoCo>= 3
and updated to the model in MuJoCov5
that produces more expected behaviour likev4
and MuJoCo< 3
(#1019). - Alpha 2 includes new v5 MuJoCo environments as a follow-up to v4 environments added two years ago, fixing consistencies, adding new features and updating the documentation. We have decided to mark the MuJoCo-py (v2 and v3) environments as deprecated and plan to remove them from Gymnasium in future (#926).
- Lunar Lander version increased from v2 to v3 due to two bug fixes. The first fixes the determinism of the environment such that the world object was not completely destroyed on reset causing non-determinism in particular cases (#979). Second, the wind generation (by default turned off) was not randomly generated by each reset, therefore, we have updated this to gain statistical independence between episodes (#959).
Box Samples
It was discovered that the spaces.Box
would allow low and high values outside the dtype’s range (#774) which could result in some very strange edge cases that were very difficult to detect. We hope that these changes improve debugging and detecting invalid inputs to the space, however, let us know if your environment raises issues related to this.
Bug Fixes
- Updates
CartPoleVectorEnv
for the new autoreset API (#915) - Fixed
wrappers.vector.RecordEpisodeStatistics
episode length computation from new autoreset api (#1018) - Remove
mujoco-py
import error for v4+ MuJoCo environments (#934) - Fix
make_vec(**kwargs)
not being passed to vector entry point envs (#952) - Fix reading shared memory for
Tuple
andDict
spaces (#941) - Fix
Multidiscrete.from_jsonable
for windows (#932) - Remove
play
rendering normalisation (#956)
New Features
- Added Python 3.12 support
- Add a new
OneOf
space that provides exclusive unions of spaces (#812) - Update
Dict.sample
to use standard Python dicts rather thanOrderedDict
due to dropping Python 3.7 support (#977) - Jax environment return jax data rather than numpy data (#817)
- Add
wrappers.vector.HumanRendering
and remove human rendering fromCartPoleVectorEnv
(#1013) - Add more helpful error messages if users use a mixture of Gym and Gymnasium (#957)
- Add
sutton_barto_reward
argument forCartPole
that changes the reward function to not return 1 on terminating states (#958) - Add
visual_options
rendering argument for MuJoCo environments (#965) - Add
exact
argument toutlis.env_checker.data_equivilance
(#924) - Update
wrapper.NormalizeObservation
observation space and change observation tofloat32
(#978) - Catch exception during
env.spec
if kwarg is unpickleable (#982) - Improving ImportError for Box2D (#1009)
- Added metadata field to VectorEnv and VectorWrapper (#1006)
- Fix
make_vec
for sync or async when modifying make arguments (#1027)
Full Changelog: v1.0.0a1...v1.0.0a2 v0.29.1...v1.0.0a2
v1.0.0a1: v1.0.0 alpha1 ¶
Released on 2024-02-13 - GitHub - PyPI
Over the last few years, the volunteer team behind Gym and Gymnasium has worked to fix bugs, improve the documentation, add new features, and change the API where appropriate such that the benefits outweigh the costs. This is the first alpha release of v1.0.0
, which aims to be the end of this road of changing the project's API along with containing many new features and improved documentation.
To install v1.0.0a1, you must use pip install gymnasium==1.0.0a1
or pip install --pre gymnasium
otherwise, v0.29.1
will be installed. Similarly, the website will default to v0.29.1's documentation, which can be changed with the pop-up in the bottom right.
We are really interested in projects testing with these v1.0.0 alphas to find any bugs, missing documentation, or issues with the API changes before we release v1.0 in full.
Removing the plugin system
Within Gym v0.23+ and Gymnasium v0.26 to v0.29, an undocumented feature that has existed for registering external environments behind the scenes has been removed. For users of Atari (ALE), Minigrid or HighwayEnv, then users could use the following code:
import gymnasium as gym
env = gym.make("ALE/Pong-v5")
such that despite Atari never being imported (i.e., import ale_py
), users can still load an Atari environment. This feature has been removed in v1.0.0, which will require users to update to
import gymnasium as gym
import ale_py
gym.register_envs(ale_py) # optional
env = gym.make("ALE/Pong-v5")
Alternatively, users can do the following where the ale_py
within the environment id will import the module
import gymnasium as gym
env = gym.make("ale_py:ALE/Pong-v5") # `module_name:env_id`
For users with IDEs (i.e., VSCode, PyCharm), then import ale_py
can cause the IDE (and pre-commit isort / black / flake8) to believe that the import statement does nothing. Therefore, we have introduced gymnasium.register_envs
as a no-op function (the function literally does nothing) to make the IDE believe that something is happening and the import statement is required.
Note: ALE-py, Minigrid, and HighwayEnv must be updated to work with Gymnasium v1.0.0, which we hope to complete for all projects affected by alpha 2.
Vector environments
To increase the sample speed of an environment, vectorizing is one of the easiest ways to sample multiple instances of the same environment simultaneously. Gym and Gymnasium provide the VectorEnv
as a base class for this, but one of its issues has been that it inherited Env
. This can cause particular issues with type checking (the return type of step
is different for Env
and VectorEnv
), testing the environment type (isinstance(env, Env)
can be true for vector environments despite the two actings differently) and finally wrappers (some Gym and Gymnasium wrappers supported Vector environments but there are no clear or consistent API for determining which did or didn’t). Therefore, we have separated out Env
and VectorEnv
to not inherit from each other.
In implementing the new separate VectorEnv
class, we have tried to minimize the difference between code using Env
and VectorEnv
along with making it more generic in places. The class contains the same attributes and methods as Env
along with num_envs: int
, single_action_space: gymnasium.Space
and single_observation_space: gymnasium.Space
. Additionally, we have removed several functions from VectorEnv
that are not needed for all vector implementations: step_async
, step_wait
, reset_async
, reset_wait
, call_async
and call_wait
. This change now allows users to write their own custom vector environments, v1.0.0a1 includes an example vector cartpole environment that runs thousands of times faster than using Gymnasium’s Sync vector environment.
To allow users to create vectorized environments easily, we provide gymnasium.make_vec
as a vectorized equivalent of gymnasium.make
. As there are multiple different vectorization options (“sync”, “async”, and a custom class referred to as “vector_entry_point”), the argument vectorization_mode
selects how the environment is vectorized. This defaults to None
such that if the environment has a vector entry point for a custom vector environment implementation, this will be utilized first (currently, Cartpole is the only environment with a vector entry point built into Gymnasium). Otherwise, the synchronous vectorizer is used (previously, the Gym and Gymnasium vector.make
used asynchronous vectorizer as default). For more information, see the function docstring.
env = gym.make("CartPole-v1")
env = gym.wrappers.ClipReward(env, min_reward=-1, max_reward=3)
envs = gym.make_vec("CartPole-v1", num_envs=3)
envs = gym.wrappers.vector.ClipReward(envs, min_reward=-1, max_reward=3)
Due to this split of Env
and VectorEnv
, there are now Env
only wrappers and VectorEnv
only wrappers in gymnasium.wrappers
and gymnasium.wrappers.vector
respectively. Furthermore, we updated the names of the base vector wrappers from VectorEnvWrapper
to VectorWrapper
and added VectorObservationWrapper
, VectorRewardWrapper
and VectorActionWrapper
classes. See the vector wrapper page for new information.
To increase the efficiency of vector environment, autoreset is a common feature that allows sub-environments to reset without requiring all sub-environments to finish before resetting them all. Previously in Gym and Gymnasium, auto-resetting was done on the same step as the environment episode ends, such that the final observation and info would be stored in the step’s info, i.e., info["final_observation"]
and info[“final_info”]
and standard obs and info containing the sub-environment’s reset observation and info. This required similar general sampling for vectorized environments.
replay_buffer = []
obs, _ = envs.reset()
for _ in range(total_timesteps):
next_obs, rewards, terminations, truncations, infos = envs.step(envs.action_space.sample())
for j in range(envs.num_envs):
if not (terminations[j] or truncations[j]):
replay_buffer.append((
obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
))
else:
replay_buffer.append((
obs[j], rewards[j], terminations[j], truncations[j], infos["next_obs"][j]
))
obs = next_obs
However, over time, the development team has recognized the inefficiency of this approach (primarily due to the extensive use of a Python dictionary) and the annoyance of having to extract the final observation to train agents correctly, for example. Therefore, in v1.0.0, we are modifying autoreset to align with specialized vector-only projects like EnvPool and SampleFactory such that the sub-environment’s doesn’t reset until the next step. As a result, this requires the following changes when sampling. For environments with more complex observation spaces (and action actions) then
replay_buffer = []
obs, _ = envs.reset()
autoreset = np.zeros(envs.num_envs)
for _ in range(total_timesteps):
next_obs, rewards, terminations, truncations, _ = envs.step(envs.action_space.sample())
for j in range(envs.num_envs):
if not autoreset[j]:
replay_buffer.append((
obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
))
obs = next_obs
autoreset = np.logical_or(terminations, truncations)
Finally, we have improved AsyncVectorEnv.set_attr
and SyncVectorEnv.set_attr
functions to use the Wrapper.set_wrapper_attr
to allow users to set variables anywhere in the environment stack if it already exists. Previously, this was not possible and users could only modify the variable in the “top” wrapper on the environment stack, importantly not the actual environment its self.
Wrappers
Previously, some wrappers could support both environment and vector environments, however, this was not standardized, and was unclear which wrapper did and didn't support vector environments. For v1.0.0, with separating Env
and VectorEnv
to no longer inherit from each other (read more in the vector section), the wrappers in gymnasium.wrappers
will only support standard environments and wrappers in gymnasium.wrappers.vector
contains the provided specialized vector wrappers (most but not all wrappers are supported, please raise a feature request if you require it).
In v0.29, we deprecated the Wrapper.__getattr__
function to be replaced by Wrapper.get_wrapper_attr
, providing access to variables anywhere in the environment stack. In v1.0.0, we have added Wrapper.set_wrapper_attr
as an equivalent function for setting a variable anywhere in the environment stack if it already exists; only the variable is set in the top wrapper (or environment).
Most significantly, we have removed, renamed, and added several wrappers listed below.
- Removed wrappers
monitoring.VideoRecorder
- The replacement wrapper isRecordVideo
StepAPICompatibility
- We expect all Gymnasium environments to use the terminated / truncated step API, therefore, user shouldn't need theStepAPICompatibility
wrapper. Shimmy includes a compatibility environments to convert gym-api environment's for gymnasium.
- Renamed wrappers (We wished to make wrappers consistent in naming. Therefore, we have removed "Wrapper" from all wrappers and included "Observation", "Action" and "Reward" within wrapper names where appropriate)
AutoResetWrapper
->Autoreset
FrameStack
->FrameStackObservation
PixelObservationWrapper
->AddRenderObservation
- Moved wrappers (All vector wrappers are in
gymnasium.wrappers.vector
)VectorListInfo
->vector.DictInfoToList
- Added wrappers
DelayObservation
- Adds a delay to the next observation and rewardDtypeObservation
- Modifies the dtype of an environment’s observation spaceMaxAndSkipObservation
- Will skipn
observations and will max over the last 2 observations, inspired by the Atari environment heuristic for other environmentsStickyAction
- Random repeats actions with a probability for a step returning the final observation and sum of rewards over steps. Inspired by Atari environment heuristicsJaxToNumpy
- Converts a Jax-based environment to use Numpy-based input and output data forreset
,step
, etcJaxToTorch
- Converts a Jax-based environment to use PyTorch-based input and output data forreset
,step
, etcNumpyToTorch
- Converts a Numpy-based environment to use PyTorch-based input and output data forreset
,step
, etc
For all wrappers, we have added example code documentation and a changelog to help future researchers understand any changes made. See the following page for an example.
Functional environments
One of the substantial advantages of Gymnasium's Env
is it generally requires minimal information about the underlying environment specifications however, this can make applying such environments to planning, search algorithms, and theoretical investigations more difficult. We are proposing FuncEnv
as an alternative definition to Env
which is closer to a Markov Decision Process definition, exposing more functions to the user, including the observation, reward, and termination functions along with the environment’s raw state as a single object.
from typing import Any
import gymnasium as gym
from gymnasium.functional import StateType, ObsType, ActType, RewardType, TerminalType, Params
class ExampleFuncEnv(gym.functional.FuncEnv):
def initial(rng: Any, params: Params | None = None) → StateType
…
def transition(state: StateType, action: ActType, rng: Any, params: Params | None = None) → StateType
…
def observation(state: StateType, params: Params | None = None) → ObsType
…
def reward(
state: StateType, action: ActType, next_state: StateType, params: Params | None = None
) → RewardType
…
def terminal(state: StateType, params: Params | None = None) → TerminalType
…
FuncEnv
requires that initial
and transition
functions to return a new state given its inputs as a partial implementation of Env.step
and Env.reset
. As a result, users can sample (and save) the next state for a range of inputs to use with planning, searching, etc. Given a state, observation
, reward
, and terminal
provide users explicit definitions to understand how each can affect the environment's output.
Additional bug fixes
- Limit the cython version for
gymnasium[mujoco-py]
due to cython==3 issues by @pseudo-rnd-thoughts (#616) - Fix
MuJoCo
environment type issues by @Kallinteris-Andreas (#612) - Fix mujoco rendering with custom width values by @logan-dunbar (#634)
- Fix environment checker to correctly report infinite bounds by @chrisyeh96 (#708)
- Fix type hint for
register(kwargs)
from**kwargs
tokwargs: dict | None = None
by @younik (#788) - Fix
CartPoleVectorEnv
step counter to be set back to zero onreset
by @TimSchneider42 (#886) - Fix registration for async vector environment for custom environments by @RedTachyon (#810)
Additional new features
- New MuJoCo v5 environments (the changes and performance graphs will be included in a separate blog post) by @Kallinteris-Andreas (#572)
- Add support in MuJoCo human rendering to changing the size of the viewing window by @logan-dunbar (#635)
- Add more control in MuJoCo rendering over offscreen dimensions and scene geometries by @guyazran (#731)
- Add support to handle
NamedTuples
inJaxToNumpy
,JaxToTorch
andNumpyToTorch
by @RogerJL (#789) and @pseudo-rnd-thoughts (#811) - Add
padding_type
parameter toFrameSkipObservation
to select the padding observation by @jamartinh (#830) - Add render check to
check_environments_match
by @Kallinteris-Andreas (#748)
Deprecation
- Remove unnecessary error classes in error.py by @pseudo-rnd-thoughts (#801)
- Stop exporting MuJoCo v2 environment classes from
gymnasium.envs.mujoco
by @Kallinteris-Andreas (#827) - Remove deprecation warning from PlayPlot by @pseudo-rnd-thoughts (#800)
Documentation changes
- Updated the custom environment tutorial for v1.0.0 by @kir0ul (#709)
- Add swig to installation instructions for Box2D by @btjanaka (#683)
- Add tutorial Load custom quadruped robot environments using
Gymnasium/MuJoCo/Ant-v5
framework by @Kallinteris-Andreas (#838) - Add third-party tutorial page to list tutorials write and hosted on other websites by @pseudo-rnd-thoughts (#867)
- Add more introductory pages by @pseudo-rnd-thoughts (#791)
- Add figures for each MuJoCo environments representing their action space by @Kallinteris-Andreas (#762)
- Fix the documentation on blackjack's starting state by @pseudo-rnd-thoughts (#893)
- Fix the documentation on Frozenlake and Cliffwalking's position by @PierreCounathe (#695)
- Update the classic control environment's
__init__
andreset
arguments by @pseudo-rnd-thoughts (#898)
Full Changelog: v0.29.0...v1.0.0a1
v0.29.1 ¶
Released on 2023-08-21 - GitHub - PyPI
A minimal release that fixes a warning produced by Wrapper.__getattr__
.
In particular, this function will be removed in v1.0.0 however the reported solution for this was incorrect and the updated solution still caused the warning to show (due to technical python reasons).
Changes
- The
Wrapper.__getattr__
warning reports the incorrect new function,get_attr
rather thanget_wrapper_attr
- When using
get_wrapper_attr
, the__getattr__
warning is still be raised due toget_wrapper_attr
usinghasattr
which under the hood uses__getattr__.
Therefore, updated to remove the unintended warning. - Add warning to
VectorEnvWrapper.__getattr__
to specify that it also is deprecated in v1.0.0
Full Changelog: v0.29.0...v0.29.1
v0.29.0¶
Released on 2023-07-14 - GitHub - PyPI
v0.29.0 Release notes
We finally have a software citation for Gymnasium with the plan to release an associated paper after v1.0, thank you to all the contributors over the last 3 years who have made helped Gym and Gymnasium (#590)
@misc{towers_gymnasium_2023,
title = {Gymnasium},
url = {https://zenodo.org/record/8127025},
abstract = {An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)},
urldate = {2023-07-08},
publisher = {Zenodo},
author = {Towers, Mark and Terry, Jordan K. and Kwiatkowski, Ariel and Balis, John U. and Cola, Gianluca de and Deleu, Tristan and Goulão, Manuel and Kallinteris, Andreas and KG, Arjun and Krimmel, Markus and Perez-Vicente, Rodrigo and Pierré, Andrea and Schulhoff, Sander and Tai, Jun Jet and Shen, Andrew Tan Jin and Younis, Omar G.},
month = mar,
year = {2023},
doi = {10.5281/zenodo.8127026},
}
Gymnasium has a conda package, conda install gymnasium
. Thanks to @ChristofKaufmann for completing this
Breaking Changes
- Drop support for Python 3.7 which has reached its end of life support by @Kallinteris-Andreas in #573
- Update MuJoCo Hopper & Walker2D models to work with MuJoCo >= 2.3.3 by @Kallinteris-Andreas in #589
- Add deprecation warnings to several features which will be removed in v1.0:
Wrapper.__get_attr__
,gymnasium.make(..., autoreset=True)
,gymnasium.make(..., apply_api_compatibility=True)
,Env.reward_range
andgymnasium.vector.make
. For their proposed replacement, see #535 - Raise error for
Box
bounds oflow > high
,low == inf
andhigh == -inf
by @jjshoots in #495 - Add dtype testing for NumPy Arrays in
data_equivalence()
by @pseudo-rnd-thoughts in #515 - Remove Jumpy from gymnasium wrappers as it was partially implemented with limited testing and usage by @pseudo-rnd-thoughts in #548
- Update project require for
jax>=0.4
by @charraut in #373
New Features
- Remove the restrictions on pygame version,
pygame>=2.1.3
by @pseudo-rnd-thoughts in #558 - Adding
start
parameter toMultiDiscrete
space, similar to theDiscrete(..., start)
parameter by @Rayerdyne in #557 - Adds testing to
check_env
that closing a closed environment doesn't raise an error by @pseudo-rnd-thoughts in #564 - On initialisation
wrapper.RecordVideo
throws an error if the environment has an invalid render mode(None, "human", "ansi")
by @robertoschiavone in #580 - Add
MaxAndSkipObservation
wrapper by @LucasAlegre in #561 - Add
check_environments_match
function for checking if two environments are identical by @Kallinteris-Andreas in #576 - Add performance debugging utilities,
utils/performance.py
by @Kallinteris-Andreas in #583 - Added Jax based cliff walking environment by @balisujohn in #407
- MuJoCo
- Add support for relative paths with
xml_file
arguments by @Kallinteris-Andreas in #536 - Add support for environments to specify
info
inreset
by @Kallinteris-Andreas in #540 - Remove requirement of environments defining
metadata["render_fps"]
, the value is determined on__init__
usingdt
by @Kallinteris-Andreas in #525
- Add support for relative paths with
- Experimental
- Add deprecated wrapper error in
gymnasium.experimental.wrappers
by @charraut in #341 - Add
fps
argument toRecordVideoV0
for custom fps value that overrides an environment's internalrender_fps
value by @younik in #503 - Add experimental vector wrappers for lambda observation, action and reward wrappers by @pseudo-rnd-thoughts in #444
- Add deprecated wrapper error in
Bug Fixes
- Fix
spaces.Dict.keys()
askey in keys
was False by @pseudo-rnd-thoughts in #608 - Updates the action space of
wrappers.RescaleAction
based on the bounds by @mmcaulif in #569 - Remove warnings in the passive environment checker for infinite Box bounds by @pseudo-rnd-thoughts in #435
- Revert Lunar Lander Observation space change by @alexdlukens in #512
- Fix URL links in
check_env
by @robertoschiavone in #554 - Update
shimmy[gym]
toshimmy[gym-v21]
orshimmy[gym-v26]
by @elliottower in #433 - Fix several issues within the experimental vector environment and wrappers by @pseudo-rnd-thoughts in #516
- Video recorder wrapper
- Fix
VideoRecorder
onreset
to emptyrecorded_frames
rather thanframes
by @voidflight in #518 - Remove
Env.close
inVideoRecorder.close
by @qgallouedec in #533 - Fix
VideoRecorder
andRecordVideoV0
to moveimport moviepy
such that__del__
doesn't raiseAttributeErrors
by @pseudo-rnd-thoughts in #553
- Fix
- Mujoco
- Remove Hopper-v4's old render API func by @Kallinteris-Andreas in #588
- Fix TypeError when closing rendering by @sonelu in (#440)
- Fix the wrong
nstep
in_step_mujoco_simulation
function ofMujocoEnv
by @xuanhien070594 in #424 - Allow a different number of actuator control from the action space by @reginald-mclean in #604
Documentation Updates
- Allow users to view source code of referenced objects on the website by @pseudo-rnd-thoughts in #497
- Update website homepage by @elliottower in #482
- Make atari documentation consistent by @pseudo-rnd-thoughts in #418 and add missing descriptions by @dylwil3 in #510
- Add third party envs: safety gymnasium, pyflyt, Gym-Trading-Env, stable-retro, DACBench, gym-cellular-automata by @elliottower, @stefanbschneider, @ClementPerroud, @jjshoots, @MatPoliquin, and @robertoschiavone in #450, #451, #474, #487, #529, #538, #581
- Update MuJoCo documentation for all environments and base mujoco environment by @Kallinteris-Andreas in #524, #522
- Update CartPole reward documentation to clarify different maximum rewards for v0 and v1 by @robertoschiavone in #429
- Clarify Frozen lake time limit for
FrozenLake4x4
andFrozenLake8x8
environments by @yaniv-peretz in #459 - Typo in the documentation for single_observation_space by @kvrban in #491
- Fix the rendering of warnings on the website by @helpingstar in #520
Full Changelog: v0.28.1...v0.29.0
v0.28.1¶
Released on 2023-03-25 - GitHub - PyPI
v0.28.1 Release notes
Small emergency release to fix several issues
- Fixed
gymnasium.vector
as thegymnasium/__init__.py
as it isn't imported #403 - Update third party envs to separate environments that support gymnasium and gym and have a consistent style #404
- Update the documentation for v0.28 as frontpage gif had the wrong link, experimental documentation was missing and add gym release notes #405
Full Changelog: v0.28.0...v0.28.1
v0.28.0¶
Released on 2023-03-24 - GitHub - PyPI
v0.28.0 Release notes
This release introduces improved support for the reproducibility of Gymnasium environments, particularly for offline reinforcement learning. gym.make
can now create the entire environment stack, including wrappers, such that training libraries or offline datasets can specify all of the arguments and wrappers used for an environment. For a majority of standard usage (gym.make(”EnvironmentName-v0”)
), this will be backwards compatible except for certain fairly uncommon cases (i.e. env.spec
and env.unwrapped.spec
return different specs) this is a breaking change. See the reproducibility details section for more info.
In v0.27, we added the experimental
folder to allow us to develop several new features (wrappers and hardware accelerated environments). We’ve introduced a new experimental VectorEnv
class. This class does not inherit from the standard Env
class, and will allow for dramatically more efficient parallelization features. We plan to improve the implementation and add vector based wrappers in several minor releases over the next few months.
Additionally, we have optimized module loading so that PyTorch or Jax are only loaded when users import wrappers that require them, not on import gymnasium
.
Reproducibility details
In previous versions, Gymnasium supported gym.make(spec)
where the spec
is an EnvSpec
from gym.spec(str)
or env.spec
and worked identically to the string based gym.make(“”)
. In both cases, there was no way to specify additional wrappers that should be applied to an environment. With this release, we added additional_wrappers
to EnvSpec
for specifying wrappers applied to the base environment (TimeLimit
, PassiveEnvChecker
, Autoreset
and ApiCompatibility
are not included as they are specify in other fields).
This additional field will allow users to accurately save or reproduce an environment used in training for a policy or to generate an offline RL dataset. We provide a json converter function (EnvSpec.to_json
) for saving the EnvSpec
to a “safe” file type however there are several cases (NumPy data, functions) which cannot be saved to json. In these cases, we recommend pickle but be warned that this can allow remote users to include malicious data in the spec.
import gymnasium as gym
env = gym.make("CartPole-v0")
env = gym.wrappers.TimeAwareObservation(env)
print(env)
# <TimeAwareObservation<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v0>>>>>>
env_spec = env.spec
env_spec.pprint()
# id=CartPole-v0
# reward_threshold=195.0
# max_episode_steps=200
# additional_wrappers=[
# name=TimeAwareObservation, kwargs={}
# ]
import json
import pickle
json_env_spec = json.loads(env_spec.to_json())
pickled_env_spec = pickle.loads(pickle.dumps(env_spec))
recreated_env = gym.make(json_env_spec)
print(recreated_env)
# <TimeAwareObservation<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v0>>>>>>
# Be aware that the `TimeAwareObservation` was included by `make`
To support this type of recreation, wrappers must inherit from gym.utils.RecordConstructorUtils
to allow gym.make
to know what arguments to create the wrapper with. Gymnasium has implemented this for all built-in wrappers but for external projects, should be added to each wrapper. To do this, call gym.utils.RecordConstructorUtils.__init__(self, …)
in the first line of the wrapper’s constructor with identical l keyword arguments as passed to the wrapper’s constructor, except for env
. As an example see the Atari Preprocessing wrapper
For a more detailed discussion, see the original PRs - #292 and #355
Other Major Changes
- In Gymnasium v0.26, the
GymV22Compatibility
environment was added to support Gym-based environments in Gymnasium. However, the name was incorrect as the env supported Gym’s v0.21 API, not v0.22, therefore, we have updated it toGymV21Compatibility
to accurately reflect the API supported. #282 - The
Sequence
space allows for a dynamic number of elements in an observation or action space sample. To make this more efficient, we added astack
argument which can support which can support a more efficient representation of an element than atuple
, which was what was previously supported. #284 Box.sample
previously would clip incorrectly for up-bounded spaces such that 0 could never be sampled if the dtype was discrete or boolean. This is fixed such that 0 can be sampled in these cases. #249- If
jax
orpytorch
was installed then onimport gymnasium
both of these modules would also be loaded causing significant slow downs in load time. This is now fixed such thatjax
andtorch
are only loaded when particular wrappers is loaded by the user. #323 - In v0.26, we added parameters for
Wrapper
to allow different observation and action types to be specified for the wrapper and its sub-environment. However, this raised type issues with pyright and mypy, this is now fixed through Wrapper having four generic arguments,[ObsType, ActType, WrappedEnvObsType, WrappedEnvActType]
. #337 - In v0.25 and 0.v26 several new space types were introduced,
Text
,Graph
andSequence
however the vector utility functions were not updated to support these spaces. Support for these spaces has been added to the experimental vector space utility functions:batch_space
,concatenate
,iterate
andcreate_empty_array
. #223 - Due to a lack of testing the experimental stateful observation wrappers (
FrameStackObservation
,DelayObservation
andTimeAwareObservation
) did not work as expected. These wrappers are now fixed and testing has been added. #224
Minor changes
- Allow the statistics of NormalizeX wrappers to be disabled and enabled for use during evaluation by @raphajaner in #268
- Fix AttributeError in lunar_lander.py by @DrRyanHuang in #278
- Add testing for docstrings (doctest) such that docstrings match implementations by @valentin-cnt in #281
- Type hint fixes and added
__all__
dunder by @howardh in #321 - Fix type hints errors in gymnasium/spaces by @valentin-cnt in #327
- Update the experimental vector shared memory util functions by @pseudo-rnd-thoughts in #339
- Change Gymnasium Notices to Farama Notifications by @jjshoots in #332
- Added Jax-based Blackjack environment by @balisujohn in #338
Documentation changes
- Fix references of the MultiBinary and MultiDiscrete classes in documentation by @Matyasch in #279
- Add Comet integration by @nerdyespresso in #304
- Update atari documentation by @pseudo-rnd-thoughts in #330
- Document Box integer bounds by @mihaic in #331
- Add docstring parser to remove duplicate in Gymnasium website by @valentin-cnt in #329
- Fix a grammatical mistake in basic usage page by @keyb0ardninja in #333
- Update docs/README.md to link to a new CONTRIBUTING.md for docs by @mgoulao in #340
MuJoCo/Ant
clarify the lack ofuse_contact_forces
on v3 (and older) by @Kallinteris-Andreas in #342
What's Changed
Thank you to our new contributors in this release: @Matyasch, @DrRyanHuang, @nerdyespresso, @khoda81, @howardh, @mihaic, and @keyb0ardninja.
Full Changelog: v0.27.1...v0.28.0
v0.27.1¶
Released on 2023-01-20 - GitHub - PyPI
Release Notes
Bugs fixed
- Replace
np.bool8
withnp.bool_
for numpy 1.24 deprecation warning by @pseudo-rnd-thoughts in #221 - Remove shimmy as a core dependency by @pseudo-rnd-thoughts in #272
- Fix silent bug in ResizeObservation for 2-dimensional observations. by @ianyfan in #230 and by @RedTachyon in #254
- Change env checker assertation to warning by @jjshoots in #215
- Revert
make
error when render mode is used without metadata render modes by @pseudo-rnd-thoughts in #216 - Update prompt messages for extra dependencies by @XuehaiPan in #250
- Fix return type of
AsyncVectorEnv.reset
by @younik in #252 - Update the jumpy error to specify the pip install is jax-jumpy by @pseudo-rnd-thoughts in #255
- Fix type annotations of
callable
toCallable
by @ianyfan in #259 - Fix experimental normalize reward wrapper by @rafaelcp in #277
New features/improvements
- Improve LunarLander-v2
step
performance by >1.5x by @PaulMest in #235 - Added vector env support to StepAPICompatibility wrapper by @nidhishs in #238
- Allow sequence to accept stacked np arrays if the feature space is Box by @jjshoots in #241
- Improve the warning when an error is raised from a plugin by @pseudo-rnd-thoughts in #225
- Add changelog (release notes) to the website by @mgoulao in #257
- Implement RecordVideoV0 by @younik in #246
- Add explicit error messages when unflatten discrete and multidiscrete fail by @PierreMardon in #267
Documentation updates
- Added doctest to CI and fixed all existing errors in docstrings by @valentin-cnt in #274
- Add a tutorial for vectorized envs using A2C. by @till2 in #234
- Fix
MuJoCo.Humanoid
action description by @Kallinteris-Andreas in #206 Ant
use_contact_forces
obs and reward DOC by @Kallinteris-Andreas in #218MuJoCo.Reacher-v4
doc fixes by @Kallinteris-Andreas in #219- Mention truncation in the migration guide by @RedTachyon in #105
- docs(tutorials): fixed environment creation link by @lpizzinidev in #244
Mujoco/Hooper
doc minor typo fix by @Kallinteris-Andreas in #247- Add comment describing what convolve does in A2C tutorial by @metric-space in #264
- Fix environment versioning in README.md by @younik in #270
- Add Tutorials galleries by @mgoulao in #258
Thanks to the new contributors to Gymnasium, if you want to get involved, join our discord server. Linked in the readme.
- @PaulMest made their first contribution in #235
- @nidhishs made their first contribution in #238
- @lpizzinidev made their first contribution in #244
- @ianyfan made their first contribution in #230
- @metric-space made their first contribution in #264
- @PierreMardon made their first contribution in #267
- @valentin-cnt made their first contribution in #274
- @rafaelcp made their first contribution in #277
Full Changelog: v0.27.0...v0.27.1
v0.27.0¶
Released on 2022-12-12 - GitHub - PyPI
Release Notes
Gymnasium 0.27.0 is our first major release of Gymnasium. It has several significant new features, and numerous small bug fixes and code quality improvements as we work through our backlog. There should be no breaking changes beyond dropping Python 3.6 support and remove the mujoco Viewer
class in favor of a MujocoRendering
class. You should be able to upgrade your code that's using Gymnasium 0.26.x to 0.27.0 with little-to-no-effort.
Like always, our development roadmap is publicly available here so you can follow our future plans. The only large breaking changes that are still planned are switching selected environments to use hardware accelerated physics engines and our long standing plans for overhauling the vector API and built-in wrappers.
This release notably includes an entirely new part of the library: gymnasium.experimental
. We are adding new features, wrappers and functional environment API discussed below for users to test and try out to find bugs and provide feedback.
New Wrappers
These new wrappers, accessible in gymnasium.experimental.wrappers
, see the full list in https://gymnasium.farama.org/main/api/experimental/ are aimed to replace the wrappers in gymnasium v0.30.0 and contain several improvements
- (Work in progress) Support arbitrarily complex observation / action spaces. As RL has advanced, action and observation spaces are becoming more complex and the current wrappers were not implemented with this mind.
- Support for Jax-based environments. With hardware accelerated environments, i.e. Brax, written in Jax and similar PyTorch based programs, NumPy is not the only game in town anymore for writing environments. Therefore, these upgrades will use Jumpy, a project developed by Farama Foundation to provide automatic compatibility for NumPy, Jax and in the future PyTorch data for a large subset of the NumPy functions.
- More wrappers. Projects like Supersuit aimed to bring more wrappers for RL, however, many users were not aware of the wrappers, so we plan to move the wrappers into Gymnasium. If we are missing common wrappers from the list provided above, please create an issue and we would be interested in adding it.
- Versioning. Like environments, the implementation details of wrappers can cause changes in agent performance. Therefore, we propose adding version numbers to all wrappers, i.e.,
LambaActionV0
. We don't expect these version numbers to change regularly and will act similarly to environment version numbers. This should ensure that all users know when significant changes could affect your agent's performance for environments and wrappers. Additionally, we hope that this will improve reproducibility of RL in the future, which is critical for academia. - In v28, we aim to rewrite the VectorEnv to not inherit from Env, as a result new vectorized versions of the wrappers will be provided.
Core developers: @gianlucadecola, @RedTachyon, @pseudo-rnd-thoughts
Functional API
The Env
class provides a very generic structure for environments to be written in allowing high flexibility in the program structure. However, this limits the ability to efficiently vectorize environments, compartmentalize the environment code, etc. Therefore, the gymnasium.experimental.FuncEnv
provides a much more strict structure for environment implementation with stateless functions, for every stage of the environment implementation. This class does not inherit from Env
and requires a translation / compatibility class for doing this. We already provide a FuncJaxEnv
for converting jax-based FuncEnv
to Env
. We hope this will help improve the readability of environment implementations along with potential speed-ups for users that vectorize their code.
This API is very experimental so open to changes in the future. We are interested in feedback from users who try to use the API which we believe will be in particular interest to users exploring RL planning, model-based RL and modifying environment functions like the rewards.
Core developers: @RedTachyon, @pseudo-rnd-thoughts, @balisujohn
Other Major changes
- Refactor Mujoco Rendering mechanisms to use a separate thread for OpenGL. Remove
Viewer
in favor ofMujocoRenderer
which offscreen, human and other render mode can use by @rodrigodelazcano in #112 - Add deprecation warning to
gym.make(..., apply_env_compatibility=True)
in favour ofgym.make("GymV22Environment", env_id="...")
by @pseudo-rnd-thoughts in #125 - Add
gymnasium.pprint_registry()
for pretty printing the gymnasium registry by @kad99kev in #124 - Changes
Discrete.dtype
tonp.int64
such that samples arenp.int64
not python ints. by @pseudo-rnd-thoughts in #141 - Add migration guide for OpenAI Gym v21 to v26 by @pseudo-rnd-thoughts in #72
- Add complete type hinting of
core.py
forEnv
,Wrapper
and more by @pseudo-rnd-thoughts in #39 - Add complete type hinting for all spaces in
gymnasium.spaces
by @pseudo-rnd-thoughts in #37 - Make window in
play()
resizable by @Markus28 in #190 - Add REINFORCE implementation tutorial by @siddarth-c in #155
Bug fixes and documentation changes
- Remove auto close in
VideoRecorder
wrapper by @younik in #42 - Change
seeding.np_random
error message to report seed type by @theo-brown in #74 - Include shape in MujocoEnv error message by @ikamensh in #83
- Add pretty Feature/GitHub issue form by @tobirohrer in #89
- Added testing for the render return data in
check_env
andPassiveEnvChecker
by @Markus28 in #117 - Fix docstring and update action space description for classic control environments by @Thytu in #123
- Fix
__all__
in root__init__.py
to specify the correct folders by @pseudo-rnd-thoughts in #130 - Fix
play()
assertion error by @Markus28 in #132 - Update documentation for Frozen Lake
is_slippy
by @MarionJS in #136 - Fixed warnings when
render_mode
is None by @younik in #143 - Added
is_np_flattenable
property to documentation by @Markus28 in #172 - Updated Wrapper documentation by @Markus28 in #173
- Updated formatting of spaces documentation by @Markus28 in #174
- For FrozenLake, add seeding in random map generation by @kir0ul in #139
- Add notes for issues when unflattening samples from flattened spaces by @rusu24edward in #164
- Include pusher environment page to website by @axb2035 in #171
- Add check in
AsyncVectorEnv
for success before splitting result instep_wait
by @aaronwalsman in #178 - Add documentation for
MuJoCo.Ant-v4.use_contact_forces
by @Kallinteris-Andreas in #183 - Fix typos in README.md by @cool-RR in #184
- Add documentation for
MuJoCo.Ant
v4 changelog by @Kallinteris-Andreas in #186 - Fix
MuJoCo.Ant
action order in documentation by @Kallinteris-Andreas in #208 - Add
raise-from
exception for the whole codebase by @cool-RR in #205
Behind-the-scenes changes
- Docs Versioning by @mgoulao in #73
- Added Atari environments to tests, removed dead code by @Markus28 in #78
- Fix missing build steps in versioning workflows by @mgoulao in #81
- Small improvements to environments pages by @mgoulao in #110
- Update the third-party environment documentation by @pseudo-rnd-thoughts in #138
- Update docstrings for improved documentation by @axb2035 in #160
- Test core dependencies in CI by @pseudo-rnd-thoughts in #146
- Update and rerun
pre-commit
hooks for better code quality by @XuehaiPan in #179
v0.26.3¶
Released on 2022-10-24 - GitHub - PyPI
Release Notes
Note: ale-py (atari) has not updated to Gymnasium yet. Therefore pip install gymnasium[atari]
will fail, this will be fixed in v0.27
. In the meantime, use pip install shimmy[atari]
for the fix.
Bug Fixes
- Added Gym-Gymnasium compatibility converter to allow users to use Gym environments in Gymnasium by @RedTachyon in #61
- Modify metadata in the
HumanRendering
andRenderCollection
wrappers to have the correct metadata by @RedTachyon in #35 - Simplified
EpisodeStatisticsRecorder
wrapper by @DavidSlayback in #31 - Fix integer overflow in MultiDiscrete.flatten() by @olipinski in #55
- Re-add the ability to specify the XML file for Mujoco environments by @Kallinteris-Andreas in #70
Documentation change
- Add a tutorial for training an agent in Blackjack by @till2 in #64
- A very long list of documentation updates by @mgoulao, @vairodp, @WillDudley, @pseudo-rnd-thoughts and @jjshoots
Full Changelog: v0.26.2...v0.26.3
Thank you for the new contributors
- @vairodp made their first contribution in #41
- @DavidSlayback made their first contribution in #31
- @WillDudley made their first contribution in #51
- @olipinski made their first contribution in #55
- @jjshoots made their first contribution in #58
- @vmoens made their first contribution in #60
- @till2 made their first contribution in #64
- @Kallinteris-Andreas made their first contribution in #70
v0.26.2 ¶
Released on 2022-10-05 - GitHub - PyPI
This Release is an upstreamed version of Gym v26.2
Bugs Fixes
- As reset now returns (obs, info) then in the vector environments, this caused the final step's info to be overwritten. Now, the final observation and info are contained within the info as "final_observation" and "final_info" @pseudo-rnd-thoughts
- Adds warnings when trying to render without specifying the render_mode @younik
- Updates Atari Preprocessing such that the wrapper can be pickled @vermouth1992
- Github CI was hardened to such that the CI just has read permissions @sashashura
- Clarify and fix typo in GraphInstance @ekalosak
v0.26.1¶
Released on 2022-09-16 - GitHub - PyPI
This Release is an upstreamed version of Gym v26.1
In addition, the gym docs repo has been merged in with the new website https://gymnasium.farama.org/
v0.26.0: Initial Release¶
Released on 2022-09-13 - GitHub - PyPI
This is the first release of Gymnasium, a maintained fork of OpenAI Gym
This release is identical to the Gym v0.26.0 except for the project name (Gymnasium) and Code of Conduct
Read #12 for the roadmap of changes