v21 to v26 Migration Guide#
Gymnasium is a fork of OpenAI Gym v26, which introduced a large breaking change from Gym v21.
In this guide, we briefly outline the API changes from Gym v21 - which a number of tutorials have been written for - to Gym v26.
For environments still stuck in the v21 API, users can use the EnvCompatibility
wrapper to convert them to v26 compliant.
For more information, see the guide
Example code for v21#
import gym
env = gym.make("LunarLander-v2", options={})
env.seed(123)
observation = env.reset()
done = False
while not done:
action = env.action_space.sample() # agent policy that uses the observation and info
observation, reward, done, info = env.step(action)
env.render(mode="human")
env.close()
Example code for v26#
import gym
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset(seed=123, options={})
done = False
while not done:
action = env.action_space.sample() # agent policy that uses the observation and info
observation, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
env.close()
Seed and random number generator#
The Env.seed()
has been removed from the Gym v26 environments in favour of Env.reset(seed=seed)
.
This allows seeding to only be changed on environment reset.
The decision to remove seed
was because some environments use emulators that cannot change random number generators within an episode and must be done at the beginning of a new episode.
We are aware of cases where controlling the random number generator is important, in these cases, if the environment uses the built-in random number generator, users can set the seed manually with the attribute np_random
.
Gymnasium v26 changed to using numpy.random.Generator
instead of a custom random number generator.
This means that several functions such as randint
were removed in favour of integers
.
While some environments might use external random number generator, we recommend using the attribute np_random
that wrappers and external users can access and utilise.
Environment Reset#
In v26, reset()
takes two optional parameters and returns one value.
This contrasts to v21 which takes no parameters and returns None
.
The two parameters are seed
for setting the random number generator and options
which allows additional data to be passed to the environment on reset.
For example, in classic control, the options
parameter now allows users to modify the range of the state bound.
See the original PR for more details.
reset()
further returns info
, similar to the info
returned by step()
.
This is important because info
can include metrics or valid action mask that is used or saved in the next step.
To update older environments, we highly recommend that super().reset(seed=seed)
is called on the first line of reset()
.
This will automatically update the np_random
with the seed value.
Environment Step#
In v21, the type definition of step()
is tuple[ObsType, SupportsFloat, bool, dict[str, Any]
representing the next observation, the reward from the step, if the episode is done and additional info from the step.
Due to reproducibility issues that will be expanded on in a blog post soon, we have changed the type definition to tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]
adding an extra boolean value.
This extra bool corresponds to the older done now changed to terminated and truncated.
These changes were introduced in Gym v26 (turned off by default in v25).
For users wishing to update, in most cases, replacing done
with terminated
and truncated=False
in step()
should address most issues.
However, environments that have reasons for episode truncation rather than termination should read through the associated PR.
For users looping through an environment, they should modify done = terminated or truncated
as is show in the example code.
For training libraries, the primary difference is to change done
to terminated
, indicating whether bootstrapping should or shouldn’t happen.
TimeLimit Wrapper#
In v21, the TimeLimit
wrapper added an extra key in the info
dictionary TimeLimit.truncated
whenever the agent reached the time limit without reaching a terminal state.
In v26, this information is instead communicated through the truncated return value described in the previous section, which is True if the agent reaches the time limit, whether or not it reaches a terminal state. The old dictionary entry is equivalent to truncated and not terminated
Environment Render#
In v26, a new render API was introduced such that the render mode is fixed at initialisation as some environments don’t allow on-the-fly render mode changes. Therefore, users should now specify the render_mode
within gym.make
as shown in the v26 example code above.
For a more complete explanation of the changes, please refer to this summary.
Removed code#
GoalEnv - This was removed, users needing it should reimplement the environment or use Gymnasium Robotics which contains an implementation of this environment.
from gym.envs.classic_control import rendering
- This was removed in favour of users implementing their own rendering systems. Gymnasium environments are coded using pygame.Robotics environments - The robotics environments have been moved to the Gymnasium Robotics project.
Monitor wrapper - This wrapper was replaced with two separate wrapper,
RecordVideo
andRecordEpisodeStatistics