Migration Guide - v0.21 to v1.0.0

Who Should Read This Guide?

If you’re new to Gymnasium: You can probably skip this page! This guide is for users migrating from older versions of OpenAI Gym. If you’re just starting with RL, head to Basic Usage instead.

If you’re migrating from OpenAI Gym: This guide will help you update your code to work with Gymnasium. The changes are significant but straightforward once you understand the reasoning behind them.

If you’re updating old tutorials: Many online RL tutorials use the old v0.21 API. This guide shows you how to modernize that code.

Why Did the API Change?

Gymnasium is a fork of OpenAI Gym v0.26, which introduced breaking changes from Gym v0.21. These changes weren’t made lightly - they solved important problems that made RL research and development more difficult.

The main issues with the old API were: - Ambiguous episode endings: The single done flag couldn’t distinguish between “task completed” and “time limit reached” - Inconsistent seeding: Random number generation was unreliable and hard to reproduce - Rendering complexity: Switching between visual modes was unnecessarily complicated - Reproducibility problems: Subtle bugs made it difficult to reproduce research results

For environments still using the v0.21 API, see the compatibility guide.

Quick Reference: Complete Changes Table

Component

v0.21 (Old)

v0.26+ (New)

Impact

Package Import

import gym

import gymnasium as gym

All code

Environment Reset

obs = env.reset()

obs, info = env.reset()

Training loops

Random Seeding

env.seed(42)

env.reset(seed=42)

Reproducibility

Step Function

obs, reward, done, info = env.step(action)

obs, reward, terminated, truncated, info = env.step(action)

RL algorithms

Episode Ending

while not done:

while not (terminated or truncated):

Training loops

Render Mode

env.render(mode="human")

gym.make(env_id, render_mode="human")

Visualization

Time Limit Detection

info.get('TimeLimit.truncated')

truncated return value

RL algorithms

Value Bootstrapping

target = reward + (1-done) * gamma * next_value

target = reward + (1-terminated) * gamma * next_value

RL correctness

Side-by-Side Code Comparison

Old v0.21 Code

import gym

# Environment creation and seeding
env = gym.make("LunarLander-v3", options={})
env.seed(123)
observation = env.reset()

# Training loop
done = False
while not done:
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    env.render(mode="human")

env.close()

New v0.26+ Code (Including v1.0.0)

import gymnasium as gym  # Note: 'gymnasium' not 'gym'

# Environment creation with render mode specified upfront
env = gym.make("LunarLander-v3", render_mode="human")

# Reset with seed parameter
observation, info = env.reset(seed=123, options={})

# Training loop with terminated/truncated distinction
done = False
while not done:
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    # Episode ends if either terminated OR truncated
    done = terminated or truncated

env.close()

Key Changes Breakdown

1. Package Name Change

Old: import gym New: import gymnasium as gym

Why: Gymnasium is a separate project that maintains and improves upon the original Gym codebase.

# Update your imports
# OLD
import gym

# NEW
import gymnasium as gym

2. Seeding and Random Number Generation

The biggest conceptual change is how randomness is handled.

Old v0.21: Separate seed() method

env = gym.make("CartPole-v1")
env.seed(42)  # Set random seed
obs = env.reset()  # Reset environment

New v0.26+: Seed passed to reset()

env = gym.make("CartPole-v1")
obs, info = env.reset(seed=42)  # Seed and reset together

Why this changed: Some environments (especially emulated games) can only set their random state at the beginning of an episode, not mid-episode. The old approach could lead to inconsistent behavior.

Practical impact:

# OLD: Seeding applied to all future episodes
env.seed(42)
for episode in range(10):
    obs = env.reset()

# NEW: Each episode can have its own seed
for episode in range(10):
    obs, info = env.reset(seed=42 + episode)  # Each episode gets unique seed

3. Environment Reset Changes

Old v0.21: Returns only observation

observation = env.reset()

New v0.26+: Returns observation AND info

observation, info = env.reset()

Why this changed:

  • info provides consistent access to debugging information

  • seed parameter enables reproducible episodes

  • options parameter allows episode-specific configuration

Common migration pattern:

# If you don't need the new features, just unpack the tuple
obs, _ = env.reset()  # Ignore info with underscore

# If you want to maintain the same random behavior as v0.21
env.reset(seed=42)  # Set seed once
# Then for subsequent resets:
obs, info = env.reset()  # Uses internal random state

4. Step Function: The doneterminated/truncated Split

This is the most important change for training algorithms.

Old v0.21: Single done flag

obs, reward, done, info = env.step(action)

New v0.26+: Separate terminated and truncated flags

obs, reward, terminated, truncated, info = env.step(action)

Why this matters:

  • terminated: Episode ended because the task was completed or failed (agent reached goal, died, etc.)

  • truncated: Episode ended due to external constraints (time limit, step limit, etc.)

This distinction is crucial for value function bootstrapping in RL algorithms:

# OLD (ambiguous)
if done:
    # Should we bootstrap? We don't know if this was natural termination or time limit!
    next_value = 0  # Assumption that may be wrong

# NEW (clear)
if terminated:
    next_value = 0      # Natural ending - no future value
elif truncated:
    next_value = value_function(next_obs)  # Time limit - estimate future value

Migration strategy:

# Simple migration (works for many cases)
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated

# Better migration (preserves RL algorithm correctness)
obs, reward, terminated, truncated, info = env.step(action)
if terminated:
    # Episode naturally ended - use reward as-is
    target = reward
elif truncated:
    # Episode cut short - may need to estimate remaining value
    target = reward + discount * estimate_value(obs)

For more information, see our blog post about it.

5. Render Mode Changes

Old v0.21: Render mode specified each time

env = gym.make("CartPole-v1")
env.render(mode="human")     # Visual window
env.render(mode="rgb_array") # Get pixel array

New v0.26+: Render mode fixed at creation

env = gym.make("CartPole-v1", render_mode="human")     # For visual display
env = gym.make("CartPole-v1", render_mode="rgb_array") # For recording
env.render()  # Uses the mode specified at creation

Why this changed: Some environments can’t switch render modes on-the-fly. Fixing the mode at creation enables better optimization and prevents bugs.

Practical implications:

# OLD: Could switch modes dynamically
env = gym.make("CartPole-v1")
for episode in range(10):
    # ... episode code ...
    if episode % 10 == 0:
        env.render(mode="human")  # Show every 10th episode

# NEW: Create separate environments for different purposes
training_env = gym.make("CartPole-v1")  # No rendering for speed
eval_env = gym.make("CartPole-v1", render_mode="human")  # Visual for evaluation

# Or use None for no rendering, then create visual env when needed
env = gym.make("CartPole-v1", render_mode=None)  # Fast training
if need_visualization:
    visual_env = gym.make("CartPole-v1", render_mode="human")

TimeLimit Wrapper Changes

The TimeLimit wrapper behavior also changed to align with the new termination model.

Old v0.21: Added TimeLimit.truncated to info dict ```python obs, reward, done, info = env.step(action) if done and info.get(‘TimeLimit.truncated’, False):

# Episode ended due to time limit pass

New v0.26+: Uses the truncated return value

obs, reward, terminated, truncated, info = env.step(action)
if truncated:
    # Episode ended due to time limit (or other truncation)
    pass
if terminated:
    # Episode ended naturally (success/failure)
    pass

This makes time limit detection much cleaner and more explicit.


## Updating Your Training Code

### Basic Training Loop Migration

**Old v0.21 pattern**:
```python
for episode in range(num_episodes):
    obs = env.reset()
    done = False

    while not done:
        action = agent.get_action(obs)
        next_obs, reward, done, info = env.step(action)

        # Train agent (this may have bugs due to ambiguous 'done')
        agent.learn(obs, action, reward, next_obs, done)
        obs = next_obs

New v0.26+ pattern:

for episode in range(num_episodes):
    obs, info = env.reset(seed=episode)  # Optional: unique seed per episode
    terminated, truncated = False, False

    while not (terminated or truncated):
        action = agent.get_action(obs)
        next_obs, reward, terminated, truncated, info = env.step(action)

        # Train agent with proper termination handling
        agent.learn(obs, action, reward, next_obs, terminated)
        obs = next_obs

Q-Learning Update Migration

Old v0.21 (potentially incorrect):

def update_q_value(obs, action, reward, next_obs, done):
    if done:
        target = reward  # Assumes all episode endings are natural terminations
    else:
        target = reward + gamma * max(q_table[next_obs])

    q_table[obs][action] += lr * (target - q_table[obs][action])

New v0.26+ (correct):

def update_q_value(obs, action, reward, next_obs, terminated):
    if terminated:
        # Natural termination - no future value
        target = reward
    else:
        # Episode continues - truncation has no impact on the possible future value
        target = reward + gamma * max(q_table[next_obs])

    q_table[obs][action] += lr * (target - q_table[obs][action])

Deep RL Framework Migration

Most libraries have already updated, see their documentation for more information.

Environment-Specific Changes

Removed Environments

Some environments were moved or removed:

# OLD: Robotics environments in main gym
import gym
env = gym.make("FetchReach-v1")  # No longer available

# NEW: Moved to separate package
import gymnasium

import gymnasium_robotics
import ale_py

gymnasium.register_envs((gymnasium_robotics, ale_py))

env = gymnasium.make("FetchReach-v1")
env = gymnasium.make("ALE/Pong-v5")

Compatibility Helpers

Using Old Environments

If you need to use an environment that hasn’t been updated to the new API:

```python # For environments still using old gym env = gym.make(“GymV21Environment-v0”, env_id=”OldEnv-v0”)

# This wrapper converts old API to new API automatically

For more details, see the compatibility guide <gym_compatibility>_.

Testing Your Migration

After migrating, verify that:

  • [ ] Import statements use gymnasium instead of gym

  • [ ] Reset calls handle the (obs, info) return format

  • [ ] Step calls handle terminated and truncated separately

  • [ ] Render mode is specified during environment creation

  • [ ] Random seeding uses the seed parameter in reset()

  • [ ] Training algorithms properly distinguish termination types

Use the from gymnasium.utils.env_checker import check_env to verify their implementation.

Getting Help

If you encounter issues during migration:

  1. Check the compatibility guide: Some old environments can be used with compatibility wrappers

  2. Look at the environment documentation: Each environment may have specific migration notes

  3. Test with simple environments first: Start with CartPole before moving to complex environments

  4. Compare old vs new behavior: Run the same code with both APIs to understand differences

Common resources: