Classes
The following classes are available globally.

Description
This environment is based on the environment introduced by Schulman, Moritz, Levine, Jordan and Abbeel in “HighDimensional Continuous Control Using Generalized Advantage Estimation”. The ant is a 3D robot consisting of one torso (free rotational body) with four legs attached to it with each leg having two links. The goal is to coordinate the four legs to move in the forward (right) direction by applying torques on the eight hinges connecting the two links of each leg and the torso (nine parts and eight hinges).
Action Space
The action space is a
Box(1, 1, (8,), float32)
. An action represents the torques applied at the hinge joints.  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit   —  —————————————————————–  ———–  ———–  ——————————–  —–  ————   0  Torque applied on the rotor between the torso and front left hip  1  1  hip_1 (front_left_leg)  hinge  torque (N m)   1  Torque applied on the rotor between the front left two links  1  1  angle_1 (front_left_leg)  hinge  torque (N m)   2  Torque applied on the rotor between the torso and front right hip  1  1  hip_2 (front_right_leg)  hinge  torque (N m)   3  Torque applied on the rotor between the front right two links  1  1  angle_2 (front_right_leg)  hinge  torque (N m)   4  Torque applied on the rotor between the torso and back left hip  1  1  hip_3 (back_leg)  hinge  torque (N m)   5  Torque applied on the rotor between the back left two links  1  1  angle_3 (back_leg)  hinge  torque (N m)   6  Torque applied on the rotor between the torso and back right hip  1  1  hip_4 (right_back_leg)  hinge  torque (N m)   7  Torque applied on the rotor between the back right two links  1  1  angle_4 (right_back_leg)  hinge  torque (N m) Observation Space
Observations consist of positional values of different body parts of the ant, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x and ycoordinates of the ant’s torso. These may be included by passing
exclude_current_positions_from_observation=False
during construction. In that case, the observation space will have 113 dimensions where the first two dimensions represent the x and y coordinates of the ant’s torso. Regardless of whetherexclude_current_positions_from_observation
was set to true or false, the x and ycoordinates of the torso will be returned ininfo
with keys"x_position"
and"y_position"
, respectively. However, by default, an observation is andarray
with shape(111,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit  —–————————————————————–——–——–———————————————————————–  0  zcoordinate of the torso (centre)  Inf  Inf  torso  free  position (m)   1  xorientation of the torso (centre)  Inf  Inf  torso  free  angle (rad)   2  yorientation of the torso (centre)  Inf  Inf  torso  free  angle (rad)   3  zorientation of the torso (centre)  Inf  Inf  torso  free  angle (rad)   4  worientation of the torso (centre)  Inf  Inf  torso  free  angle (rad)   5  angle between torso and first link on front left  Inf  Inf  hip_1 (front_left_leg)  hinge  angle (rad)   6  angle between the two links on the front left  Inf  Inf  ankle_1 (front_left_leg)  hinge  angle (rad)   7  angle between torso and first link on front right  Inf  Inf  hip_2 (front_right_leg)  hinge  angle (rad)   8  angle between the two links on the front right  Inf  Inf  ankle_2 (front_right_leg)  hinge  angle (rad)   9  angle between torso and first link on back left  Inf  Inf  hip_3 (back_leg)  hinge  angle (rad)   10  angle between the two links on the back left  Inf  Inf  ankle_3 (back_leg)  hinge  angle (rad)   11  angle between torso and first link on back right  Inf  Inf  hip_4 (right_back_leg)  hinge  angle (rad)   12  angle between the two links on the back right  Inf  Inf  ankle_4 (right_back_leg)  hinge  angle (rad)   13  xcoordinate velocity of the torso  Inf  Inf  torso  free  velocity (m/s)   14  ycoordinate velocity of the torso  Inf  Inf  torso  free  velocity (m/s)   15  zcoordinate velocity of the torso  Inf  Inf  torso  free  velocity (m/s)   16  xcoordinate angular velocity of the torso  Inf  Inf  torso  free  angular velocity (rad/s)   17  ycoordinate angular velocity of the torso  Inf  Inf  torso  free  angular velocity (rad/s)   18  zcoordinate angular velocity of the torso  Inf  Inf  torso  free  angular velocity (rad/s)   19  angular velocity of angle between torso and front left link  Inf  Inf  hip_1 (front_left_leg)  hinge  angle (rad)   20  angular velocity of the angle between front left links  Inf  Inf  ankle_1 (front_left_leg)  hinge  angle (rad)   21  angular velocity of angle between torso and front right link  Inf  Inf  hip_2 (front_right_leg)  hinge  angle (rad)   22  angular velocity of the angle between front right links  Inf  Inf  ankle_2 (front_right_leg)  hinge  angle (rad)   23  angular velocity of angle between torso and back left link  Inf  Inf  hip_3 (back_leg)  hinge  angle (rad)   24  angular velocity of the angle between back left links  Inf  Inf  ankle_3 (back_leg)  hinge  angle (rad)   25  angular velocity of angle between torso and back right link  Inf  Inf  hip_4 (right_back_leg)  hinge  angle (rad)   26 angular velocity of the angle between back right links  Inf  Inf  ankle_4 (right_back_leg)  hinge  angle (rad)  The remaining 14*6 = 84 elements of the observation are contact forces (external forces  force x, y, z and torque x, y, z) applied to the center of mass of each of the links. The 14 links are: the ground link, the torso link, and 3 links for each leg (1 + 1 + 12) with the 6 external forces. The (x,y,z) coordinates are translational DOFs while the orientations are rotational DOFs expressed as quaternions. One can read more about free joints on the Mujoco Documentation. Note: Antv4 environment no longer has the following contact forces issue. If using previous Humanoid versions from v4, there have been reported issues that using a MujocoPy version > 2.0 results in the contact forces always being 0. As such we recommend to use a MujocoPy version < 2.0 when using the Ant environment if you would like to report results with contact forces (if contact forces are not used in your experiments, you can use version > 2.0).Rewards
The reward consists of three parts:
 healthy_reward: Every timestep that the ant is healthy (see definition in section “Episode Termination”), it gets a reward of fixed value
healthy_reward
 forward_reward: A reward of moving forward which is measured as
(xcoordinate before action  xcoordinate after action)/dt. dt is the time
between actions and is dependent on the
frame_skip
parameter (default is 5), where the frametime is 0.01  making the default dt = 5 * 0.01 = 0.05. This reward would be positive if the ant moves forward (in positive x direction).  ctrl_cost: A negative reward for penalising the ant if it takes actions
that are too large. It is measured as
ctrl_cost_weight
* sum(action^{2}) wherectr_cost_weight
is a parameter set for the control and has a default value of 0.5.  contact_cost: A negative reward for penalising the ant if the external contact
force is too large. It is calculated
contact_cost_weight
* sum(clip(external contact force tocontact_force_range
)^{2}). The total reward returned is reward = healthy_reward + forward_reward  ctrl_cost  contact_cost andinfo
will also contain the individual reward terms. ### Starting State All observations start in state (0.0, 0.0, 0.75, 1.0, 0.0 … 0.0) with a uniform noise in the range of [reset_noise_scale
,reset_noise_scale
] added to the positional values and standard normal noise with mean 0 and standard deviationreset_noise_scale
added to the velocity values for stochasticity. Note that the initial z coordinate is intentionally selected to be slightly high, thereby indicating a standing up ant. The initial orientation is designed to make it face forward as well. ### Episode End The ant is said to be unhealthy if any of the following happens:  Any of the state space values is no longer finite
 The zcoordinate of the torso is not in the closed interval given by
healthy_z_range
(defaults to [0.2, 1.0]) Ifterminate_when_unhealthy=True
is passed during construction (which is the default), the episode ends when any of the following happens:  Truncation: The episode duration reaches a 1000 timesteps
 Termination: The ant is unhealthy
If
terminate_when_unhealthy=False
is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower.env = gym.make('Antv2')
v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.env = gym.make('Antv4', ctrl_cost_weight=0.1, ...)
 Parameter  Type  Default Description  ————————————————–—————————— xml_file
 str "ant.xml"
 Path to a MuJoCo model  ctrl_cost_weight
 float 0.5
 Weight for ctrl_cost term (see section on reward)  contact_cost_weight
 float 5e4
 Weight for contact_cost term (see section on reward)  healthy_reward
 float 1
 Constant reward given if the ant is “healthy” after timestep  terminate_when_unhealthy
 boolTrue
 If true, issue a done signal if the zcoordinate of the torso is no longer in thehealthy_z_range
 healthy_z_range
 tuple (0.2, 1)
 The ant is considered healthy if the zcoordinate of the torso is in this range  contact_force_range
 tuple (1, 1)
 Contact forces are clipped to this range in the computation of contact_cost  reset_noise_scale
 float 0.1
 Scale of random perturbations of initial position and velocity (see section on Starting State)  exclude_current_positions_from_observation
 bool True
 Whether or not to omit the x and ycoordinates from observations. Excluding the position can serve as an inductive bias to induce positionagnostic behavior in policies  ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
 v0: Initial versions release (1.0.0)
 healthy_reward: Every timestep that the ant is healthy (see definition in section “Episode Termination”), it gets a reward of fixed value

A dynamic graph is a workspace for computations. All tensor variables can be tracked from a dynamic graph.
See moreDeclaration
Swift
public final class DynamicGraph

A model is a base class for stateful operations on a dynamic graph. It can be use to construct computations statically, thus, more efficient.
See moreDeclaration
Swift
public class Model

A model builder is a more generic type of model. A model can be quite static, thus, you have to be quite careful to have a model work with dynamic inputs. You cannot use reshape, or anything that can generate fixed tensor outputs from a fixed inputs.
A model builder on the other hand doesn’t have that restriction. When input changes, it simply calls the given builder closure to construct a new model. In such way, you can continue to use reshape etc to assume fixed inputs and outputs, it will just work for dynamic inputs. The newly built model will carry over stateful components (parameters) from the old models, thus, it doesn’t reset your training. This also means you need to make sure parameter shape won’t change when input changes, otherwise we will fatal.
See moreDeclaration
Swift
public final class ModelBuilder<T> : AnyModelBuilder

Description
This environment is based on the work by P. Wawrzyński in “A CatLike Robot RealTime Learning to Run”. The HalfCheetah is a 2dimensional robot consisting of 9 links and 8 joints connecting them (including two paws). The goal is to apply a torque on the joints to make the cheetah run forward (right) as fast as possible, with a positive reward allocated based on the distance moved forward and a negative reward allocated for moving backward. The torso and head of the cheetah are fixed, and the torque can only be applied on the other 6 joints over the front and back thighs (connecting to the torso), shins (connecting to the thighs) and feet (connecting to the shins).
Action Space
The action space is a
Box(1, 1, (6,), float32)
. An action represents the torques applied between links.  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit   —  —————————————  ———–  ———–  ——————————–  —–  ————   0  Torque applied on the back thigh rotor  1  1  bthigh  hinge  torque (N m)   1  Torque applied on the back shin rotor  1  1  bshin  hinge  torque (N m)   2  Torque applied on the back foot rotor  1  1  bfoot  hinge  torque (N m)   3  Torque applied on the front thigh rotor  1  1  fthigh  hinge  torque (N m)   4  Torque applied on the front shin rotor  1  1  fshin  hinge  torque (N m)   5  Torque applied on the front foot rotor  1  1  ffoot  hinge  torque (N m) Observation Space
Observations consist of positional values of different body parts of the cheetah, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the xcoordinate of the cheetah’s center of mass. It may be included by passing
exclude_current_positions_from_observation=False
during construction. In that case, the observation space will have 18 dimensions where the first dimension represents the xcoordinate of the cheetah’s center of mass. Regardless of whetherexclude_current_positions_from_observation
was set to true or false, the xcoordinate will be returned ininfo
with key"x_position"
. However, by default, the observation is andarray
with shape(17,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  ————————————  —  —  ——————————–  —–  ————————   0  zcoordinate of the front tip  Inf  Inf  rootz  slide  position (m)   1  angle of the front tip  Inf  Inf  rooty  hinge  angle (rad)   2  angle of the second rotor  Inf  Inf  bthigh  hinge  angle (rad)   3  angle of the second rotor  Inf  Inf  bshin  hinge  angle (rad)   4  velocity of the tip along the xaxis  Inf  Inf  bfoot  hinge  angle (rad)   5  velocity of the tip along the yaxis  Inf  Inf  fthigh  hinge  angle (rad)   6  angular velocity of front tip  Inf  Inf  fshin  hinge  angle (rad)   7  angular velocity of second rotor  Inf  Inf  ffoot  hinge  angle (rad)   8  xcoordinate of the front tip  Inf  Inf  rootx  slide  velocity (m/s)   9  ycoordinate of the front tip  Inf  Inf  rootz  slide  velocity (m/s)   10  angle of the front tip  Inf  Inf  rooty  hinge  angular velocity (rad/s)   11  angle of the second rotor  Inf  Inf  bthigh  hinge  angular velocity (rad/s)   12  angle of the second rotor  Inf  Inf  bshin  hinge  angular velocity (rad/s)   13  velocity of the tip along the xaxis  Inf  Inf  bfoot  hinge  angular velocity (rad/s)   14  velocity of the tip along the yaxis  Inf  Inf  fthigh  hinge  angular velocity (rad/s)   15  angular velocity of front tip  Inf  Inf  fshin  hinge  angular velocity (rad/s)   16  angular velocity of second rotor  Inf  Inf  ffoot  hinge  angular velocity (rad/s) Rewards
The reward consists of two parts:
 forward_reward: A reward of moving forward which is measured
as
forward_reward_weight
* (xcoordinate before action  xcoordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (fixed to 5), where the frametime is 0.01  making the default dt = 5 * 0.01 = 0.05. This reward would be positive if the cheetah runs forward (right).  ctrl_cost: A cost for penalising the cheetah if it takes
actions that are too large. It is measured as
ctrl_cost_weight
* sum(action^{2}) wherectrl_cost_weight
is a parameter set for the control and has a default value of 0.1 The total reward returned is reward = forward_reward  ctrl_cost andinfo
will also contain the individual reward terms ### Starting State All observations start in state (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,) with a noise added to the initial state for stochasticity. As seen before, the first 8 values in the state are positional and the last 9 values are velocity. A uniform noise in the range of [reset_noise_scale
,reset_noise_scale
] is added to the positional values while a standard normal noise with a mean of 0 and standard deviation ofreset_noise_scale
is added to the initial velocity values of all zeros. ### Episode End The episode truncates when the episode length is greater than 1000. ### Arguments No additional arguments are currently supported in v2 and lower.env = gym.make('HalfCheetahv2')
v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.env = gym.make('HalfCheetahv4', ctrl_cost_weight=0.1, ....)
 Parameter  Type  Default  Description   ——————————————–  ———  ——————–  —————————————————————————————————————————————————————–  xml_file
 str "half_cheetah.xml"
 Path to a MuJoCo model  forward_reward_weight
 float 1.0
 Weight for forward_reward term (see section on reward)  ctrl_cost_weight
 float 0.1
 Weight for ctrl_cost weight (see section on reward)  reset_noise_scale
 float 0.1
 Scale of random perturbations of initial position and velocity (see section on Starting State)  exclude_current_positions_from_observation
 bool True
 Whether or not to omit the xcoordinate from observations. Excluding the position can serve as an inductive bias to induce positionagnostic behavior in policies  ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
 v0: Initial versions release (1.0.0)
 forward_reward: A reward of moving forward which is measured
as

Description
This environment is based on the work done by Erez, Tassa, and Todorov in “Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks”. The environment aims to increase the number of independent state and control variables as compared to the classic control environments. The hopper is a twodimensional onelegged figure that consist of four main body parts  the torso at the top, the thigh in the middle, the leg in the bottom, and a single foot on which the entire body rests. The goal is to make hops that move in the forward (right) direction by applying torques on the three hinges connecting the four body parts.
Action Space
The action space is a
Box(1, 1, (3,), float32)
. An action represents the torques applied between links  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit  —–—————————————————————————————————————–  0  Torque applied on the thigh rotor  1  1  thigh_joint  hinge  torque (N m)   1  Torque applied on the leg rotor  1  1  leg_joint  hinge  torque (N m)   3  Torque applied on the foot rotor  1  1  foot_joint  hinge  torque (N m) Observation Space
Observations consist of positional values of different body parts of the hopper, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the xcoordinate of the hopper. It may be included by passing
exclude_current_positions_from_observation=False
during construction. In that case, the observation space will have 12 dimensions where the first dimension represents the xcoordinate of the hopper. Regardless of whetherexclude_current_positions_from_observation
was set to true or false, the xcoordinate will be returned ininfo
with key"x_position"
. However, by default, the observation is andarray
with shape(11,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  ————————————————  —  —  ——————————–  —–  ————————   0  zcoordinate of the top (height of hopper)  Inf  Inf  rootz  slide  position (m)   1  angle of the top  Inf  Inf  rooty  hinge  angle (rad)   2  angle of the thigh joint  Inf  Inf  thigh_joint  hinge  angle (rad)   3  angle of the leg joint  Inf  Inf  leg_joint  hinge  angle (rad)   4  angle of the foot joint  Inf  Inf  foot_joint  hinge  angle (rad)   5  velocity of the xcoordinate of the top  Inf  Inf  rootx  slide  velocity (m/s)   6  velocity of the zcoordinate (height) of the top  Inf  Inf  rootz  slide  velocity (m/s)   7  angular velocity of the angle of the top  Inf  Inf  rooty  hinge  angular velocity (rad/s)   8  angular velocity of the thigh hinge  Inf  Inf  thigh_joint  hinge  angular velocity (rad/s)   9  angular velocity of the leg hinge  Inf  Inf  leg_joint  hinge  angular velocity (rad/s)   10  angular velocity of the foot hinge  Inf  Inf  foot_joint  hinge  angular velocity (rad/s) Rewards
The reward consists of three parts:
 healthy_reward: Every timestep that the hopper is healthy (see definition in section “Episode Termination”), it gets a reward of fixed value
healthy_reward
.  forward_reward: A reward of hopping forward which is measured
as
forward_reward_weight
* (xcoordinate before action  xcoordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (fixed to 4), where the frametime is 0.002  making the default dt = 4 * 0.002 = 0.008. This reward would be positive if the hopper hops forward (positive x direction).  ctrl_cost: A cost for penalising the hopper if it takes
actions that are too large. It is measured as
ctrl_cost_weight
* sum(action^{2}) wherectrl_cost_weight
is a parameter set for the control and has a default value of 0.001 The total reward returned is reward = healthy_reward + forward_reward  ctrl_cost andinfo
will also contain the individual reward terms ### Starting State All observations start in state (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [reset_noise_scale
,reset_noise_scale
] added to the values for stochasticity. ### Episode End The hopper is said to be unhealthy if any of the following happens:  An element of
observation[1:]
(ifexclude_current_positions_from_observation=True
, elseobservation[2:]
) is no longer contained in the closed interval specified by the argumenthealthy_state_range
 The height of the hopper (
observation[0]
ifexclude_current_positions_from_observation=True
, elseobservation[1]
) is no longer contained in the closed interval specified by the argumenthealthy_z_range
(usually meaning that it has fallen)  The angle (
observation[1]
ifexclude_current_positions_from_observation=True
, elseobservation[2]
) is no longer contained in the closed interval specified by the argumenthealthy_angle_range
Ifterminate_when_unhealthy=True
is passed during construction (which is the default), the episode ends when any of the following happens:  Truncation: The episode duration reaches a 1000 timesteps
 Termination: The hopper is unhealthy
If
terminate_when_unhealthy=False
is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower.env = gym.make('Hopperv2')
v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.env = gym.make('Hopperv4', ctrl_cost_weight=0.1, ....)
 Parameter  Type  Default  Description   ——————————————–  ———  ———————  ——————————————————————————————————————————————————————————  xml_file
 str "hopper.xml"
 Path to a MuJoCo model  forward_reward_weight
 float 1.0
 Weight for forward_reward term (see section on reward)  ctrl_cost_weight
 float 0.001
 Weight for ctrl_cost reward (see section on reward)  healthy_reward
 float 1
 Constant reward given if the ant is “healthy” after timestep  terminate_when_unhealthy
 bool True
 If true, issue a done signal if the hopper is no longer healthy  healthy_state_range
 tuple (100, 100)
 The elements ofobservation[1:]
(ifexclude_current_positions_from_observation=True
, elseobservation[2:]
) must be in this range for the hopper to be considered healthy  healthy_z_range
 tuple (0.7, float("inf"))
 The zcoordinate must be in this range for the hopper to be considered healthy  healthy_angle_range
 tuple (0.2, 0.2)
 The angle given byobservation[1]
(ifexclude_current_positions_from_observation=True
, elseobservation[2]
) must be in this range for the hopper to be considered healthy  reset_noise_scale
 float 5e3
 Scale of random perturbations of initial position and velocity (see section on Starting State)  exclude_current_positions_from_observation
 bool True
 Whether or not to omit the xcoordinate from observations. Excluding the position can serve as an inductive bias to induce positionagnostic behavior in policies  ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
 v0: Initial versions release (1.0.0)
 healthy_reward: Every timestep that the hopper is healthy (see definition in section “Episode Termination”), it gets a reward of fixed value

Declaration
Swift
public class HTTPRenderServer

Description
This environment is based on the environment introduced by Tassa, Erez and Todorov in “Synthesis and stabilization of complex behaviors through online trajectory optimization”. The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a pair of legs and arms. The legs each consist of two links, and so the arms (representing the knees and elbows respectively). The goal of the environment is to walk forward as fast as possible without falling over.
Action Space
The action space is a
Box(1, 1, (17,), float32)
. An action represents the torques applied at the hinge joints.  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit  —–——————————————————————————————————  0  Torque applied on the hinge in the ycoordinate of the abdomen  0.4  0.4  hip_1 (front_left_leg)  hinge  torque (N m)   1  Torque applied on the hinge in the zcoordinate of the abdomen  0.4  0.4  angle_1 (front_left_leg)  hinge  torque (N m)   2  Torque applied on the hinge in the xcoordinate of the abdomen  0.4  0.4  hip_2 (front_right_leg)  hinge  torque (N m)   3  Torque applied on the rotor between torso/abdomen and the right hip (xcoordinate)  0.4  0.4  right_hip_x (right_thigh)  hinge  torque (N m)   4  Torque applied on the rotor between torso/abdomen and the right hip (zcoordinate)  0.4  0.4  right_hip_z (right_thigh)  hinge  torque (N m)   5  Torque applied on the rotor between torso/abdomen and the right hip (ycoordinate)  0.4  0.4  right_hip_y (right_thigh)  hinge  torque (N m)   6  Torque applied on the rotor between the right hip/thigh and the right shin  0.4  0.4  right_knee  hinge  torque (N m)   7  Torque applied on the rotor between torso/abdomen and the left hip (xcoordinate)  0.4  0.4  left_hip_x (left_thigh)  hinge  torque (N m)   8  Torque applied on the rotor between torso/abdomen and the left hip (zcoordinate)  0.4  0.4  left_hip_z (left_thigh)  hinge  torque (N m)   9  Torque applied on the rotor between torso/abdomen and the left hip (ycoordinate)  0.4  0.4  left_hip_y (left_thigh)  hinge  torque (N m)   10  Torque applied on the rotor between the left hip/thigh and the left shin  0.4  0.4  left_knee  hinge  torque (N m)   11  Torque applied on the rotor between the torso and right upper arm (coordinate 1)  0.4  0.4  right_shoulder1  hinge  torque (N m)   12  Torque applied on the rotor between the torso and right upper arm (coordinate 2)  0.4  0.4  right_shoulder2  hinge  torque (N m)   13  Torque applied on the rotor between the right upper arm and right lower arm  0.4  0.4  right_elbow  hinge  torque (N m)   14  Torque applied on the rotor between the torso and left upper arm (coordinate 1)  0.4  0.4  left_shoulder1  hinge  torque (N m)   15  Torque applied on the rotor between the torso and left upper arm (coordinate 2)  0.4  0.4  left_shoulder2  hinge  torque (N m)   16  Torque applied on the rotor between the left upper arm and left lower arm  0.4  0.4  left_elbow  hinge  torque (N m) Observation Space
Observations consist of positional values of different body parts of the Humanoid, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x and ycoordinates of the torso. These may be included by passing
exclude_current_positions_from_observation=False
during construction. In that case, the observation space will have 378 dimensions where the first two dimensions represent the x and ycoordinates of the torso. Regardless of whetherexclude_current_positions_from_observation
was set to true or false, the x and ycoordinates will be returned ininfo
with keys"x_position"
and"y_position"
, respectively. However, by default, the observation is andarray
with shape(376,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  —————————————————————————————————————  —  —  ——————————–  —–  ————————–   0  zcoordinate of the torso (centre)  Inf  Inf  root  free  position (m)   1  xorientation of the torso (centre)  Inf  Inf  root  free  angle (rad)   2  yorientation of the torso (centre)  Inf  Inf  root  free  angle (rad)   3  zorientation of the torso (centre)  Inf  Inf  root  free  angle (rad)   4  worientation of the torso (centre)  Inf  Inf  root  free  angle (rad)   5  zangle of the abdomen (in lower_waist)  Inf  Inf  abdomen_z  hinge  angle (rad)   6  yangle of the abdomen (in lower_waist)  Inf  Inf  abdomen_y  hinge  angle (rad)   7  xangle of the abdomen (in pelvis)  Inf  Inf  abdomen_x  hinge  angle (rad)   8  xcoordinate of angle between pelvis and right hip (in right_thigh)  Inf  Inf  right_hip_x  hinge  angle (rad)   9  zcoordinate of angle between pelvis and right hip (in right_thigh)  Inf  Inf  right_hip_z  hinge  angle (rad)   19  ycoordinate of angle between pelvis and right hip (in right_thigh)  Inf  Inf  right_hip_y  hinge  angle (rad)   11  angle between right hip and the right shin (in right_knee)  Inf  Inf  right_knee  hinge  angle (rad)   12  xcoordinate of angle between pelvis and left hip (in left_thigh)  Inf  Inf  left_hip_x  hinge  angle (rad)   13  zcoordinate of angle between pelvis and left hip (in left_thigh)  Inf  Inf  left_hip_z  hinge  angle (rad)   14  ycoordinate of angle between pelvis and left hip (in left_thigh)  Inf  Inf  left_hip_y  hinge  angle (rad)   15  angle between left hip and the left shin (in left_knee)  Inf  Inf  left_knee  hinge  angle (rad)   16  coordinate1 (multiaxis) angle between torso and right arm (in right_upper_arm)  Inf  Inf  right_shoulder1  hinge  angle (rad)   17  coordinate2 (multiaxis) angle between torso and right arm (in right_upper_arm)  Inf  Inf  right_shoulder2  hinge  angle (rad)   18  angle between right upper arm and right_lower_arm  Inf  Inf  right_elbow  hinge  angle (rad)   19  coordinate1 (multiaxis) angle between torso and left arm (in left_upper_arm)  Inf  Inf  left_shoulder1  hinge  angle (rad)   20  coordinate2 (multiaxis) angle between torso and left arm (in left_upper_arm)  Inf  Inf  left_shoulder2  hinge  angle (rad)   21  angle between left upper arm and left_lower_arm  Inf  Inf  left_elbow  hinge  angle (rad)   22  xcoordinate velocity of the torso (centre)  Inf  Inf  root  free  velocity (m/s)   23  ycoordinate velocity of the torso (centre)  Inf  Inf  root  free  velocity (m/s)   24  zcoordinate velocity of the torso (centre)  Inf  Inf  root  free  velocity (m/s)   25  xcoordinate angular velocity of the torso (centre)  Inf  Inf  root  free  anglular velocity (rad/s)   26  ycoordinate angular velocity of the torso (centre)  Inf  Inf  root  free  anglular velocity (rad/s)   27  zcoordinate angular velocity of the torso (centre)  Inf  Inf  root  free  anglular velocity (rad/s)   28  zcoordinate of angular velocity of the abdomen (in lower_waist)  Inf  Inf  abdomen_z  hinge  anglular velocity (rad/s)   29  ycoordinate of angular velocity of the abdomen (in lower_waist)  Inf  Inf  abdomen_y  hinge  anglular velocity (rad/s)   30  xcoordinate of angular velocity of the abdomen (in pelvis)  Inf  Inf  abdomen_x  hinge  aanglular velocity (rad/s)   31  xcoordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)  Inf  Inf  right_hip_x  hinge  anglular velocity (rad/s)   32  zcoordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)  Inf  Inf  right_hip_z  hinge  anglular velocity (rad/s)   33  ycoordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh)  Inf  Inf  right_hip_y  hinge  anglular velocity (rad/s)   34  angular velocity of the angle between right hip and the right shin (in right_knee)  Inf  Inf  right_knee  hinge  anglular velocity (rad/s)   35  xcoordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)  Inf  Inf  left_hip_x  hinge  anglular velocity (rad/s)   36  zcoordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)  Inf  Inf  left_hip_z  hinge  anglular velocity (rad/s)   37  ycoordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh)  Inf  Inf  left_hip_y  hinge  anglular velocity (rad/s)   38  angular velocity of the angle between left hip and the left shin (in left_knee)  Inf  Inf  left_knee  hinge  anglular velocity (rad/s)   39  coordinate1 (multiaxis) of the angular velocity of the angle between torso and right arm (in right_upper_arm)  Inf  Inf  right_shoulder1  hinge  anglular velocity (rad/s)   40  coordinate2 (multiaxis) of the angular velocity of the angle between torso and right arm (in right_upper_arm)  Inf  Inf  right_shoulder2  hinge  anglular velocity (rad/s)   41  angular velocity of the angle between right upper arm and right_lower_arm  Inf  Inf  right_elbow  hinge  anglular velocity (rad/s)   42  coordinate1 (multiaxis) of the angular velocity of the angle between torso and left arm (in left_upper_arm)  Inf  Inf  left_shoulder1  hinge  anglular velocity (rad/s)   43  coordinate2 (multiaxis) of the angular velocity of the angle between torso and left arm (in left_upper_arm)  Inf  Inf  left_shoulder2  hinge  anglular velocity (rad/s)   44  angular velocitty of the angle between left upper arm and left_lower_arm  Inf  Inf  left_elbow  hinge  anglular velocity (rad/s)  Additionally, after all the positional and velocity based values in the table, the observation contains (in order): cinert: Mass and inertia of a single rigid body relative to the center of mass (this is an intermediate result of transition). It has shape 14*10 (nbody * 10) and hence adds to another 140 elements in the state space.
 cvel: Center of mass based velocity. It has shape 14 * 6 (nbody * 6) and hence adds another 84 elements in the state space
 qfrc_actuator: Constraint force generated as the actuator force. This has shape
(23,)
(nv * 1) and hence adds another 23 elements to the state space.  cfrc_ext: This is the center of mass based external force on the body. It has shape 14 * 6 (nbody * 6) and hence adds to another 84 elements in the state space. where nbody stands for the number of bodies in the robot and nv stands for the number of degrees of freedom (= dim(qvel)) The (x,y,z) coordinates are translational DOFs while the orientations are rotational DOFs expressed as quaternions. One can read more about free joints on the Mujoco Documentation. Note: Humanoidv4 environment no longer has the following contact forces issue. If using previous Humanoid versions from v4, there have been reported issues that using a MujocoPy version > 2.0 results in the contact forces always being 0. As such we recommend to use a MujocoPy version < 2.0 when using the Humanoid environment if you would like to report results with contact forces (if contact forces are not used in your experiments, you can use version > 2.0). ### Rewards The reward consists of three parts:
 healthy_reward: Every timestep that the humanoid is alive (see section Episode Termination for definition), it gets a reward of fixed value
healthy_reward
 forward_reward: A reward of walking forward which is measured as
forward_reward_weight
* (average center of mass before action  average center of mass after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (default is 5), where the frametime is 0.003  making the default dt = 5 * 0.003 = 0.015. This reward would be positive if the humanoid walks forward (in positive xdirection). The calculation for the center of mass is defined in the.py
file for the Humanoid.  ctrl_cost: A negative reward for penalising the humanoid if it has too
large of a control force. If there are nu actuators/controls, then the control has
shape
nu x 1
. It is measured asctrl_cost_weight
* sum(control^{2}).  contact_cost: A negative reward for penalising the humanoid if the external
contact force is too large. It is calculated by clipping
contact_cost_weight
* sum(external contact force^{2}) to the interval specified bycontact_cost_range
. The total reward returned is reward = healthy_reward + forward_reward  ctrl_cost  contact_cost andinfo
will also contain the individual reward terms ### Starting State All observations start in state (0.0, 0.0, 1.4, 1.0, 0.0 … 0.0) with a uniform noise in the range of [reset_noise_scale
,reset_noise_scale
] added to the positional and velocity values (values in the table) for stochasticity. Note that the initial z coordinate is intentionally selected to be high, thereby indicating a standing up humanoid. The initial orientation is designed to make it face forward as well. ### Episode End The humanoid is said to be unhealthy if the zposition of the torso is no longer contained in the closed interval specified by the argumenthealthy_z_range
. Ifterminate_when_unhealthy=True
is passed during construction (which is the default), the episode ends when any of the following happens:  Truncation: The episode duration reaches a 1000 timesteps
 Termination: The humanoid is unhealthy
If
terminate_when_unhealthy=False
is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower.env = gym.make('Humanoidv4')
v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.env = gym.make('Humanoidv4', ctrl_cost_weight=0.1, ....)
 Parameter  Type  Default  Description   ——————————————–  ———  —————  ————————————————————————————————————————————————————————  xml_file
 str "humanoid.xml"
 Path to a MuJoCo model  forward_reward_weight
 float 1.25
 Weight for forward_reward term (see section on reward)  ctrl_cost_weight
 float 0.1
 Weight for ctrl_cost term (see section on reward)  contact_cost_weight
 float 5e7
 Weight for contact_cost term (see section on reward)  healthy_reward
 float 5.0
 Constant reward given if the humanoid is “healthy” after timestep  terminate_when_unhealthy
 bool True
 If true, issue a done signal if the zcoordinate of the torso is no longer in thehealthy_z_range
 healthy_z_range
 tuple (1.0, 2.0)
 The humanoid is considered healthy if the zcoordinate of the torso is in this range  reset_noise_scale
 float 1e2
 Scale of random perturbations of initial position and velocity (see section on Starting State)  exclude_current_positions_from_observation
 bool True
 Whether or not to omit the x and ycoordinates from observations. Excluding the position can serve as an inductive bias to induce positionagnostic behavior in policies  ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
 v0: Initial versions release (1.0.0)

Description
This environment originates from control theory and builds on the cartpole environment based on the work done by Barto, Sutton, and Anderson in “Neuronlike adaptive elements that can solve difficult learning control problems”, powered by the Mujoco physics simulator  allowing for more complex experiments (such as varying the effects of gravity or constraints). This environment involves a cart that can moved linearly, with a pole fixed on it and a second pole fixed on the other end of the first one (leaving the second pole as the only one with one free end). The cart can be pushed left or right, and the goal is to balance the second pole on top of the first pole, which is in turn on top of the cart, by applying continuous forces on the cart.
Action Space
The agent take a 1element vector for actions. The action space is a continuous
(action)
in[1, 1]
, whereaction
represents the numerical force applied to the cart (with magnitude representing the amount of force and sign representing the direction)  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit  —–—————————————————————————————————–  0  Force applied on the cart  1  1  slider  slide  Force (N) Observation Space
The state space consists of positional values of different body parts of the pendulum system, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. The observation is a
ndarray
with shape(11,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  —————————————————————–  —  —  ——————————–  —–  ————————   0  position of the cart along the linear surface  Inf  Inf  slider  slide  position (m)   1  sine of the angle between the cart and the first pole  Inf  Inf  sin(hinge)  hinge  unitless   2  sine of the angle between the two poles  Inf  Inf  sin(hinge2)  hinge  unitless   3  cosine of the angle between the cart and the first pole  Inf  Inf  cos(hinge)  hinge  unitless   4  cosine of the angle between the two poles  Inf  Inf  cos(hinge2)  hinge  unitless   5  velocity of the cart  Inf  Inf  slider  slide  velocity (m/s)   6  angular velocity of the angle between the cart and the first pole  Inf  Inf  hinge  hinge  angular velocity (rad/s)   7  angular velocity of the angle between the two poles  Inf  Inf  hinge2  hinge  angular velocity (rad/s)   8  constraint force  1  Inf  Inf    Force (N)   9  constraint force  2  Inf  Inf    Force (N)   10  constraint force  3  Inf  Inf    Force (N)  There is physical contact between the robots and their environment  and Mujoco attempts at getting realisitic physics simulations for the possible physical contact dynamics by aiming for physical accuracy and computational efficiency. There is one constraint force for contacts for each degree of freedom (3). The approach and handling of constraints by Mujoco is unique to the simulator and is based on their research. Once can find more information in their documentation or in their paper “Analyticallyinvertible dynamics with contacts and constraints: Theory and implementation in MuJoCo”.Rewards
The reward consists of two parts:
 alive_bonus: The goal is to make the second inverted pendulum stand upright (within a certain angle limit) as long as possible  as such a reward of +10 is awarded for each timestep that the second pole is upright.
 distance_penalty: This reward is a measure of how far the tip of the second pendulum (the only free end) moves, and it is calculated as 0.01 * x^{2} + (y  2)^{2}, where x is the xcoordinate of the tip and y is the ycoordinate of the tip of the second pole.
 velocity_penalty: A negative reward for penalising the agent if it moves too
fast 0.001 * v_{1}^{2} + 0.005 * v_{2} ^{2}
The total reward returned is reward = alive_bonus  distance_penalty  velocity_penalty
### Starting State
All observations start in state
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range
of [0.1, 0.1] added to the positional values (cart position and pole angles) and standard
normal force with a standard deviation of 0.1 added to the velocity values for stochasticity.
### Episode End
The episode ends when any of the following happens:
1.Truncation: The episode duration reaches 1000 timesteps.
2.Termination: Any of the state space values is no longer finite.
3.Termination: The y_coordinate of the tip of the second pole is less than or equal to 1. The maximum standing height of the system is 1.196 m when all the parts are perpendicularly vertical on top of each other).
### Arguments
No additional arguments are currently supported.
env = gym.make('InvertedDoublePendulumv4')
There is no v3 for InvertedPendulum, unlike the robot environments where a v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)
 v0: Initial versions release (1.0.0)

Description
This environment is the cartpole environment based on the work done by Barto, Sutton, and Anderson in “Neuronlike adaptive elements that can solve difficult learning control problems”, just like in the classic environments but now powered by the Mujoco physics simulator  allowing for more complex experiments (such as varying the effects of gravity). This environment involves a cart that can moved linearly, with a pole fixed on it at one end and having another end free. The cart can be pushed left or right, and the goal is to balance the pole on the top of the cart by applying forces on the cart.
Action Space
The agent take a 1element vector for actions. The action space is a continuous
(action)
in[3, 3]
, whereaction
represents the numerical force applied to the cart (with magnitude representing the amount of force and sign representing the direction)  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit  —–—————————————————————————————————–  0  Force applied on the cart  3  3  slider  slide  Force (N) Observation Space
The state space consists of positional values of different body parts of the pendulum system, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. The observation is a
ndarray
with shape(4,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  ———————————————  —  —  ——————————–  —–  ————————   0  position of the cart along the linear surface  Inf  Inf  slider  slide  position (m)   1  vertical angle of the pole on the cart  Inf  Inf  hinge  hinge  angle (rad)   2  linear velocity of the cart  Inf  Inf  slider  slide  velocity (m/s)   3  angular velocity of the pole on the cart  Inf  Inf  hinge  hinge  anglular velocity (rad/s) Rewards
The goal is to make the inverted pendulum stand upright (within a certain angle limit) as long as possible  as such a reward of +1 is awarded for each timestep that the pole is upright.
Starting State
All observations start in state (0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [0.01, 0.01] added to the values for stochasticity.
Episode End
The episode ends when any of the following happens:
 Truncation: The episode duration reaches 1000 timesteps.
 Termination: Any of the state space values is no longer finite.
 Termination: The absolutely value of the vertical angle between the pole and the cart is greater than 0.2 radian.
### Arguments
No additional arguments are currently supported.
env = gym.make('InvertedPendulumv4')
There is no v3 for InvertedPendulum, unlike the robot environments where a v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)
 v0: Initial versions release (1.0.0)

A typeerased model builder.
See moreDeclaration
Swift
public class AnyModelBuilder

Declaration
Swift
public final class MuJoCoVideo<EnvType> where EnvType : MuJoCoEnv
extension MuJoCoVideo: Renderable

Declaration
Swift
public final class MuJoCoViewer<EnvType> where EnvType : MuJoCoEnv
extension MuJoCoViewer: Renderable

A stream context is an object that an execution can be performed upon.
See moreDeclaration
Swift
public final class StreamContext

Description
This environment corresponds to the Swimmer environment described in Rémi Coulom’s PhD thesis “Reinforcement Learning Using Neural Networks, with Applications to Motor Control”. The environment aims to increase the number of independent state and control variables as compared to the classic control environments. The swimmers consist of three or more segments (‘links’) and one less articulation joints (‘rotors’)  one rotor joint connecting exactly two links to form a linear chain. The swimmer is suspended in a two dimensional pool and always starts in the same position (subject to some deviation drawn from an uniform distribution), and the goal is to move as fast as possible towards the right by applying torque on the rotors and using the fluids friction.
Notes
The problem parameters are: Problem parameters:
 n: number of body parts
 m_{i}: mass of part i (i ∈ {1…n})
 l_{i}: length of part i (i ∈ {1…n})
 k: viscousfriction coefficient
While the default environment has n = 3, l_{i} = 0.1,
and k = 0.1. It is possible to pass a custom MuJoCo XML file during construction to increase the
number of links, or to tweak any of the parameters.
### Action Space
The action space is a
Box(1, 1, (2,), float32)
. An action represents the torques applied between links  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit  —–—————————————————————————————————————–  0  Torque applied on the first rotor  1  1  motor1_rot  hinge  torque (N m)   1  Torque applied on the second rotor  1  1  motor2_rot  hinge  torque (N m)  ### Observation Space By default, observations consists of:  θ_{i}: angle of part i with respect to the x axis
 θ_{i}‘: its derivative with respect to time (angular velocity)
In the default case, observations do not include the x and ycoordinates of the front tip. These may
be included by passing
exclude_current_positions_from_observation=False
during construction. Then, the observation space will have 10 dimensions where the first two dimensions represent the x and ycoordinates of the front tip. Regardless of whetherexclude_current_positions_from_observation
was set to true or false, the x and ycoordinates will be returned ininfo
with keys"x_position"
and"y_position"
, respectively. By default, the observation is andarray
with shape(8,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  ————————————  —  —  ——————————–  —–  ————————   0  angle of the front tip  Inf  Inf  free_body_rot  hinge  angle (rad)   1  angle of the first rotor  Inf  Inf  motor1_rot  hinge  angle (rad)   2  angle of the second rotor  Inf  Inf  motor2_rot  hinge  angle (rad)   3  velocity of the tip along the xaxis  Inf  Inf  slider1  slide  velocity (m/s)   4  velocity of the tip along the yaxis  Inf  Inf  slider2  slide  velocity (m/s)   5  angular velocity of front tip  Inf  Inf  free_body_rot  hinge  angular velocity (rad/s)   6  angular velocity of first rotor  Inf  Inf  motor1_rot  hinge  angular velocity (rad/s)   7  angular velocity of second rotor  Inf  Inf  motor2_rot  hinge  angular velocity (rad/s)  ### Rewards The reward consists of two parts:  forward_reward: A reward of moving forward which is measured
as
forward_reward_weight
* (xcoordinate before action  xcoordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (default is 4), where the frametime is 0.01  making the default dt = 4 * 0.01 = 0.04. This reward would be positive if the swimmer swims right as desired.  ctrl_cost: A cost for penalising the swimmer if it takes
actions that are too large. It is measured as
ctrl_cost_weight
* sum(action^{2}) wherectrl_cost_weight
is a parameter set for the control and has a default value of 1e4 The total reward returned is reward = forward_reward  ctrl_cost andinfo
will also contain the individual reward terms ### Starting State All observations start in state (0,0,0,0,0,0,0,0) with a Uniform noise in the range of [reset_noise_scale
,reset_noise_scale
] is added to the initial state for stochasticity. ### Episode End The episode truncates when the episode length is greater than 1000. ### Arguments No additional arguments are currently supported in v2 and lower.gym.make('Swimmerv4')
v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.env = gym.make('Swimmerv4', ctrl_cost_weight=0.1, ....)
 Parameter  Type  Default  Description   ——————————————–  ———  —————  ————————————————————————————————————————————————————————  xml_file
 str "swimmer.xml"
 Path to a MuJoCo model  forward_reward_weight
 float 1.0
 Weight for forward_reward term (see section on reward)  ctrl_cost_weight
 float 1e4
 Weight for ctrl_cost term (see section on reward)  reset_noise_scale
 float 0.1
 Scale of random perturbations of initial position and velocity (see section on Starting State)  exclude_current_positions_from_observation
 bool True
 Whether or not to omit the x and ycoordinates from observations. Excluding the position can serve as an inductive bias to induce positionagnostic behavior in policies  ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
 v0: Initial versions release (1.0.0)

Declaration
Swift
public final class CmdParamsFactory

Declaration
Swift
public final class AnyTensorStorage

Description
This environment builds on the hopper environment based on the work done by Erez, Tassa, and Todorov in “Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks” by adding another set of legs making it possible for the robot to walker forward instead of hop. Like other Mujoco environments, this environment aims to increase the number of independent state and control variables as compared to the classic control environments. The walker is a twodimensional twolegged figure that consist of four main body parts  a single torso at the top (with the two legs splitting after the torso), two thighs in the middle below the torso, two legs in the bottom below the thighs, and two feet attached to the legs on which the entire body rests. The goal is to make coordinate both sets of feet, legs, and thighs to move in the forward (right) direction by applying torques on the six hinges connecting the six body parts.
Action Space
The action space is a
Box(1, 1, (6,), float32)
. An action represents the torques applied at the hinge joints.  Num  Action  Control Min  Control Max  Name (in corresponding XML file)  Joint  Unit  —–——————————————————————————————————————–  0  Torque applied on the thigh rotor  1  1  thigh_joint  hinge  torque (N m)   1  Torque applied on the leg rotor  1  1  leg_joint  hinge  torque (N m)   2  Torque applied on the foot rotor  1  1  foot_joint  hinge  torque (N m)   3  Torque applied on the left thigh rotor  1  1  thigh_left_joint  hinge  torque (N m)   4  Torque applied on the left leg rotor  1  1  leg_left_joint  hinge  torque (N m)   5  Torque applied on the left foot rotor  1  1  foot_left_joint  hinge  torque (N m) Observation Space
Observations consist of positional values of different body parts of the walker, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the xcoordinate of the top. It may be included by passing
exclude_current_positions_from_observation=False
during construction. In that case, the observation space will have 18 dimensions where the first dimension represent the xcoordinates of the top of the walker. Regardless of whetherexclude_current_positions_from_observation
was set to true or false, the xcoordinate of the top will be returned ininfo
with key"x_position"
. By default, observation is andarray
with shape(17,)
where the elements correspond to the following:  Num  Observation  Min  Max  Name (in corresponding XML file)  Joint  Unit   —  ————————————————  —  —  ——————————–  —–  ————————   0  zcoordinate of the top (height of hopper)  Inf  Inf  rootz (torso)  slide  position (m)   1  angle of the top  Inf  Inf  rooty (torso)  hinge  angle (rad)   2  angle of the thigh joint  Inf  Inf  thigh_joint  hinge  angle (rad)   3  angle of the leg joint  Inf  Inf  leg_joint  hinge  angle (rad)   4  angle of the foot joint  Inf  Inf  foot_joint  hinge  angle (rad)   5  angle of the left thigh joint  Inf  Inf  thigh_left_joint  hinge  angle (rad)   6  angle of the left leg joint  Inf  Inf  leg_left_joint  hinge  angle (rad)   7  angle of the left foot joint  Inf  Inf  foot_left_joint  hinge  angle (rad)   8  velocity of the xcoordinate of the top  Inf  Inf  rootx  slide  velocity (m/s)   9  velocity of the zcoordinate (height) of the top  Inf  Inf  rootz  slide  velocity (m/s)   10  angular velocity of the angle of the top  Inf  Inf  rooty  hinge  angular velocity (rad/s)   11  angular velocity of the thigh hinge  Inf  Inf  thigh_joint  hinge  angular velocity (rad/s)   12  angular velocity of the leg hinge  Inf  Inf  leg_joint  hinge  angular velocity (rad/s)   13  angular velocity of the foot hinge  Inf  Inf  foot_joint  hinge  angular velocity (rad/s)   14  angular velocity of the thigh hinge  Inf  Inf  thigh_left_joint  hinge  angular velocity (rad/s)   15  angular velocity of the leg hinge  Inf  Inf  leg_left_joint  hinge  angular velocity (rad/s)   16  angular velocity of the foot hinge  Inf  Inf  foot_left_joint  hinge  angular velocity (rad/s) Rewards
The reward consists of three parts:
 healthy_reward: Every timestep that the walker is alive, it receives a fixed reward of value
healthy_reward
,  forward_reward: A reward of walking forward which is measured as
forward_reward_weight
* (xcoordinate before action  xcoordinate after action)/dt. dt is the time between actions and is dependeent on the frame_skip parameter (default is 4), where the frametime is 0.002  making the default dt = 4 * 0.002 = 0.008. This reward would be positive if the walker walks forward (right) desired.  ctrl_cost: A cost for penalising the walker if it
takes actions that are too large. It is measured as
ctrl_cost_weight
* sum(action^{2}) wherectrl_cost_weight
is a parameter set for the control and has a default value of 0.001 The total reward returned is reward = healthy_reward bonus + forward_reward  ctrl_cost andinfo
will also contain the individual reward terms ### Starting State All observations start in state (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [reset_noise_scale
,reset_noise_scale
] added to the values for stochasticity. ### Episode End The walker is said to be unhealthy if any of the following happens:  Any of the state space values is no longer finite
 The height of the walker is not in the closed interval specified by
healthy_z_range
 The absolute value of the angle (
observation[1]
ifexclude_current_positions_from_observation=False
, elseobservation[2]
) is not in the closed interval specified byhealthy_angle_range
Ifterminate_when_unhealthy=True
is passed during construction (which is the default), the episode ends when any of the following happens:  Truncation: The episode duration reaches a 1000 timesteps
 Termination: The walker is unhealthy
If
terminate_when_unhealthy=False
is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower.env = gym.make('Walker2dv4')
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.env = gym.make('Walker2dv4', ctrl_cost_weight=0.1, ....)
 Parameter  Type  Default  Description   ——————————————–  ———  —————  —————————————————————————————————————————————————————–  xml_file
 str "walker2d.xml"
 Path to a MuJoCo model  forward_reward_weight
 float 1.0
 Weight for forward_reward term (see section on reward)  ctrl_cost_weight
 float 1e3
 Weight for ctr_cost term (see section on reward)  healthy_reward
 float 1.0
 Constant reward given if the ant is “healthy” after timestep  terminate_when_unhealthy
 bool True
 If true, issue a done signal if the zcoordinate of the walker is no longer healthy  healthy_z_range
 tuple (0.8, 2)
 The zcoordinate of the top of the walker must be in this range to be considered healthy  healthy_angle_range
 tuple (1, 1)
 The angle must be in this range to be considered healthy  reset_noise_scale
 float 5e3
 Scale of random perturbations of initial position and velocity (see section on Starting State)  exclude_current_positions_from_observation
 bool True
 Whether or not to omit the xcoordinate from observations. Excluding the position can serve as an inductive bias to induce positionagnostic behavior in policies  ### Version History  v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
 v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
 v2: All continuous control environments now use mujoco_py >= 1.50
 v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
 v0: Initial versions release (1.0.0)
 healthy_reward: Every timestep that the walker is alive, it receives a fixed reward of value