Pusher#

../../../_images/pusher.gif

This environment is part of the Mujoco environments which contains general information about the environment.

Action Space

Box(-2.0, 2.0, (7,), float32)

Observation Space

Box(-inf, inf, (23,), float64)

import

gymnasium.make("Pusher-v5")

Description#

“Pusher” is a multi-jointed robot arm that is very similar to a human arm. The goal is to move a target cylinder (called object) to a goal position using the robot’s end effector (called fingertip). The robot consists of shoulder, elbow, forearm and wrist joints.

Action Space#

../../../_images/pusher.png

The action space is a Box(-2, 2, (7,), float32). An action (a, b) represents the torques applied at the hinge joints.

Num

Action

Control Min

Control Max

Name (in corresponding XML file)

Joint

Type (Unit)

0

Rotation of the panning the shoulder

-2

2

r_shoulder_pan_joint

hinge

torque (N m)

1

Rotation of the shoulder lifting joint

-2

2

r_shoulder_lift_joint

hinge

torque (N m)

2

Rotation of the shoulder rolling joint

-2

2

r_upper_arm_roll_joint

hinge

torque (N m)

3

Rotation of hinge joint that flexed the elbow

-2

2

r_elbow_flex_joint

hinge

torque (N m)

4

Rotation of hinge that rolls the forearm

-2

2

r_forearm_roll_joint

hinge

torque (N m)

5

Rotation of flexing the wrist

-2

2

r_wrist_flex_joint

hinge

torque (N m)

6

Rotation of rolling the wrist

-2

2

r_wrist_roll_joint

hinge

torque (N m)

Observation Space#

The observation space consists of the following parts (in order):

  • qpos (7 elements): Position values of the robot’s body parts.

  • qvel (7 elements): The velocities of these individual body parts (their derivatives).

  • xpos (3 elements): The coordinates of the fingertip of the pusher.

  • xpos (3 elements): The coordinates of the object to be moved.

  • xpos (3 elements): The coordinates of the goal position.

The observation space is a Box(-Inf, Inf, (17,), float64) where the elements are as follows:

Num

Observation

Min

Max

Name (in corresponding XML file)

Joint

Type (Unit)

0

Rotation of the panning the shoulder

-Inf

Inf

r_shoulder_pan_joint

hinge

angle (rad)

1

Rotation of the shoulder lifting joint

-Inf

Inf

r_shoulder_lift_joint

hinge

angle (rad)

2

Rotation of the shoulder rolling joint

-Inf

Inf

r_upper_arm_roll_joint

hinge

angle (rad)

3

Rotation of hinge joint that flexed the elbow

-Inf

Inf

r_elbow_flex_joint

hinge

angle (rad)

4

Rotation of hinge that rolls the forearm

-Inf

Inf

r_forearm_roll_joint

hinge

angle (rad)

5

Rotation of flexing the wrist

-Inf

Inf

r_wrist_flex_joint

hinge

angle (rad)

6

Rotation of rolling the wrist

-Inf

Inf

r_wrist_roll_joint

hinge

angle (rad)

7

Rotational velocity of the panning the shoulder

-Inf

Inf

r_shoulder_pan_joint

hinge

angular velocity (rad/s)

8

Rotational velocity of the shoulder lifting joint

-Inf

Inf

r_shoulder_lift_joint

hinge

angular velocity (rad/s)

9

Rotational velocity of the shoulder rolling joint

-Inf

Inf

r_upper_arm_roll_joint

hinge

angular velocity (rad/s)

10

Rotational velocity of hinge joint that flexed the elbow

-Inf

Inf

r_elbow_flex_joint

hinge

angular velocity (rad/s)

11

Rotational velocity of hinge that rolls the forearm

-Inf

Inf

r_forearm_roll_joint

hinge

angular velocity (rad/s)

12

Rotational velocity of flexing the wrist

-Inf

Inf

r_wrist_flex_joint

hinge

angular velocity (rad/s)

13

Rotational velocity of rolling the wrist

-Inf

Inf

r_wrist_roll_joint

hinge

angular velocity (rad/s)

14

x-coordinate of the fingertip of the pusher

-Inf

Inf

tips_arm

slide

position (m)

15

y-coordinate of the fingertip of the pusher

-Inf

Inf

tips_arm

slide

position (m)

16

z-coordinate of the fingertip of the pusher

-Inf

Inf

tips_arm

slide

position (m)

17

x-coordinate of the object to be moved

-Inf

Inf

object (obj_slidex)

slide

position (m)

18

y-coordinate of the object to be moved

-Inf

Inf

object (obj_slidey)

slide

position (m)

19

z-coordinate of the object to be moved

-Inf

Inf

object

cylinder

position (m)

20

x-coordinate of the goal position of the object

-Inf

Inf

goal (goal_slidex)

slide

position (m)

21

y-coordinate of the goal position of the object

-Inf

Inf

goal (goal_slidey)

slide

position (m)

22

z-coordinate of the goal position of the object

-Inf

Inf

goal

sphere

position (m)

To understand the state space, an analogy can be drawn to a human arm, where the words “flex” and “roll” have the same meaning as in human joints.

Rewards#

The total reward is: reward = reward_dist + reward_ctrl + reward_near.

  • reward_near: This reward is a measure of how far the fingertip of the pusher (the unattached end) is from the object, with a more negative value assigned for when the pusher’s fingertip is further away from the target. It is \(-w_{near} \|(P_{fingertip} - P_{target})\|_2\). where \(w_{near}\) is the reward_near_weight (default is \(0.5\)).

  • reward_dist: This reward is a measure of how far the object is from the target goal position, with a more negative value assigned if the object is further away from the target. It is \(-w_{dist} \|(P_{object} - P_{target})\|_2\). where \(w_{dist}\) is the reward_dist_weight (default is \(1\)).

  • reward_control: A negative reward to penalize the pusher for taking actions that are too large. It is measured as the negative squared Euclidean norm of the action, i.e. as \(-w_{control} \|action\|_2^2\). where \(w_{control}\) is the reward_control_weight (default is \(0.1\)).

info contains the individual reward terms.

Starting State#

The initial position state of the Pusher arm is \(0_{6}\). The initial position state of the object is \(\mathcal{U}_{[[-0.3, -0.2], [0, 0.2]]}\). The position state of the goal is (permanently) \([0.45, -0.05, -0.323]\). The initial velocity state of the Pusher arm is \(\mathcal{U}_{[-0.005 \times I_{6}, 0.005 \times I_{6}]}\). The initial velocity state of the object is \(0_2\). The velocity state of the goal is (permanently) \(0_3\).

where \(\mathcal{U}\) is the multivariate uniform continuous distribution.

Note that the initial position state of the object is sampled until its distance to the goal is \( > 0.17 m\).

The default frame rate is 5, with each frame lasting 0.01, so dt = 5 * 0.01 = 0.05.

Episode End#

Termination#

The Pusher never terminates.

Truncation#

The default duration of an episode is 100 timesteps.

Arguments#

Pusher provides a range of parameters to modify the observation space, reward function, initial state, and termination condition. These parameters can be applied during gymnasium.make in the following way:

import gymnasium as gym
env = gym.make('Pusher-v5', xml_file=...)

Parameter

Type

Default

Description

xml_file

str

"pusher.xml"

Path to a MuJoCo model

reward_near_weight

float

0.5

Weight for reward_near term (see Rewards section)

reward_dist_weight

float

1

Weight for reward_dist term (see Rewards section)

reward_control_weight

float

0.1

Weight for reward_control term (see Rewards section)

Version History#

  • v5:

    • Minimum mujoco version is now 2.3.3.

    • Added default_camera_config argument, a dictionary for setting the mj_camera properties, mainly useful for custom environments.

    • Added frame_skip argument, used to configure the dt (duration of step()), default varies by environment check environment documentation pages.

    • Added xml_file argument.

    • Fixed bug: reward_distance & reward_near was based on the state before the physics step, now it is based on the state after the physics step (related GitHub issue).

    • Added reward_near_weight, reward_dist_weight, reward_control_weight arguments to configure the reward function (defaults are effectively the same as in v4).

    • Fixed info["reward_ctrl"] not being multiplied by the reward weight.

    • Added info["reward_near"] which is equal to the reward term reward_near.

  • v4: All MuJoCo environments now use the MuJoCo bindings in mujoco >= 2.1.3.

  • v3: This environment does not have a v3 release.

  • v2: All continuous control environments now use mujoco-py >= 1.50.

  • v1: max_time_steps raised to 1000 for robot based tasks (not including pusher, which has a max_time_steps of 100). Added reward_threshold to environments.

  • v0: Initial versions release.