Classes Reference

Ant

Description

This environment is based on the environment introduced by Schulman, Moritz, Levine, Jordan and Abbeel in “High-Dimensional Continuous Control Using Generalized Advantage Estimation”. The ant is a 3D robot consisting of one torso (free rotational body) with four legs attached to it with each leg having two links. The goal is to coordinate the four legs to move in the forward (right) direction by applying torques on the eight hinges connecting the two links of each leg and the torso (nine parts and eight hinges).

Action Space

The action space is a Box(-1, 1, (8,), float32). An action represents the torques applied at the hinge joints. | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | | — | —————————————————————– | ———– | ———– | ——————————– | —– | ———— | | 0 | Torque applied on the rotor between the torso and front left hip | -1 | 1 | hip_1 (front_left_leg) | hinge | torque (N m) | | 1 | Torque applied on the rotor between the front left two links | -1 | 1 | angle_1 (front_left_leg) | hinge | torque (N m) | | 2 | Torque applied on the rotor between the torso and front right hip | -1 | 1 | hip_2 (front_right_leg) | hinge | torque (N m) | | 3 | Torque applied on the rotor between the front right two links | -1 | 1 | angle_2 (front_right_leg) | hinge | torque (N m) | | 4 | Torque applied on the rotor between the torso and back left hip | -1 | 1 | hip_3 (back_leg) | hinge | torque (N m) | | 5 | Torque applied on the rotor between the back left two links | -1 | 1 | angle_3 (back_leg) | hinge | torque (N m) | | 6 | Torque applied on the rotor between the torso and back right hip | -1 | 1 | hip_4 (right_back_leg) | hinge | torque (N m) | | 7 | Torque applied on the rotor between the back right two links | -1 | 1 | angle_4 (right_back_leg) | hinge | torque (N m) |

Observation Space

Observations consist of positional values of different body parts of the ant, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x- and y-coordinates of the ant’s torso. These may be included by passing exclude_current_positions_from_observation=False during construction. In that case, the observation space will have 113 dimensions where the first two dimensions represent the x- and y- coordinates of the ant’s torso. Regardless of whether exclude_current_positions_from_observation was set to true or false, the x- and y-coordinates of the torso will be returned in info with keys "x_position" and "y_position", respectively. However, by default, an observation is a ndarray with shape (111,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | |—–|————————————————————–|——–|——–|—————————————-|——-|————————–| | 0 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) | | 1 | x-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) | | 2 | y-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) | | 3 | z-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) | | 4 | w-orientation of the torso (centre) | -Inf | Inf | torso | free | angle (rad) | | 5 | angle between torso and first link on front left | -Inf | Inf | hip_1 (front_left_leg) | hinge | angle (rad) | | 6 | angle between the two links on the front left | -Inf | Inf | ankle_1 (front_left_leg) | hinge | angle (rad) | | 7 | angle between torso and first link on front right | -Inf | Inf | hip_2 (front_right_leg) | hinge | angle (rad) | | 8 | angle between the two links on the front right | -Inf | Inf | ankle_2 (front_right_leg) | hinge | angle (rad) | | 9 | angle between torso and first link on back left | -Inf | Inf | hip_3 (back_leg) | hinge | angle (rad) | | 10 | angle between the two links on the back left | -Inf | Inf | ankle_3 (back_leg) | hinge | angle (rad) | | 11 | angle between torso and first link on back right | -Inf | Inf | hip_4 (right_back_leg) | hinge | angle (rad) | | 12 | angle between the two links on the back right | -Inf | Inf | ankle_4 (right_back_leg) | hinge | angle (rad) | | 13 | x-coordinate velocity of the torso | -Inf | Inf | torso | free | velocity (m/s) | | 14 | y-coordinate velocity of the torso | -Inf | Inf | torso | free | velocity (m/s) | | 15 | z-coordinate velocity of the torso | -Inf | Inf | torso | free | velocity (m/s) | | 16 | x-coordinate angular velocity of the torso | -Inf | Inf | torso | free | angular velocity (rad/s) | | 17 | y-coordinate angular velocity of the torso | -Inf | Inf | torso | free | angular velocity (rad/s) | | 18 | z-coordinate angular velocity of the torso | -Inf | Inf | torso | free | angular velocity (rad/s) | | 19 | angular velocity of angle between torso and front left link | -Inf | Inf | hip_1 (front_left_leg) | hinge | angle (rad) | | 20 | angular velocity of the angle between front left links | -Inf | Inf | ankle_1 (front_left_leg) | hinge | angle (rad) | | 21 | angular velocity of angle between torso and front right link | -Inf | Inf | hip_2 (front_right_leg) | hinge | angle (rad) | | 22 | angular velocity of the angle between front right links | -Inf | Inf | ankle_2 (front_right_leg) | hinge | angle (rad) | | 23 | angular velocity of angle between torso and back left link | -Inf | Inf | hip_3 (back_leg) | hinge | angle (rad) | | 24 | angular velocity of the angle between back left links | -Inf | Inf | ankle_3 (back_leg) | hinge | angle (rad) | | 25 | angular velocity of angle between torso and back right link | -Inf | Inf | hip_4 (right_back_leg) | hinge | angle (rad) | | 26 |angular velocity of the angle between back right links | -Inf | Inf | ankle_4 (right_back_leg) | hinge | angle (rad) | The remaining 14*6 = 84 elements of the observation are contact forces (external forces - force x, y, z and torque x, y, z) applied to the center of mass of each of the links. The 14 links are: the ground link, the torso link, and 3 links for each leg (1 + 1 + 12) with the 6 external forces. The (x,y,z) coordinates are translational DOFs while the orientations are rotational DOFs expressed as quaternions. One can read more about free joints on the Mujoco Documentation. Note: Ant-v4 environment no longer has the following contact forces issue. If using previous Humanoid versions from v4, there have been reported issues that using a Mujoco-Py version > 2.0 results in the contact forces always being 0. As such we recommend to use a Mujoco-Py version < 2.0 when using the Ant environment if you would like to report results with contact forces (if contact forces are not used in your experiments, you can use version > 2.0).

Rewards

The reward consists of three parts:

healthy_reward: Every timestep that the ant is healthy (see definition in section “Episode Termination”), it gets a reward of fixed value healthy_reward
forward_reward: A reward of moving forward which is measured as (x-coordinate before action - x-coordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (default is 5), where the frametime is 0.01 - making the default dt = 5 * 0.01 = 0.05. This reward would be positive if the ant moves forward (in positive x direction).
ctrl_cost: A negative reward for penalising the ant if it takes actions that are too large. It is measured as ctrl_cost_weight * sum(action²) where ctr_cost_weight is a parameter set for the control and has a default value of 0.5.
contact_cost: A negative reward for penalising the ant if the external contact force is too large. It is calculated contact_cost_weight * sum(clip(external contact force to contact_force_range)²). The total reward returned is reward = healthy_reward + forward_reward - ctrl_cost - contact_cost and info will also contain the individual reward terms. ### Starting State All observations start in state (0.0, 0.0, 0.75, 1.0, 0.0 … 0.0) with a uniform noise in the range of [-reset_noise_scale, reset_noise_scale] added to the positional values and standard normal noise with mean 0 and standard deviation reset_noise_scale added to the velocity values for stochasticity. Note that the initial z coordinate is intentionally selected to be slightly high, thereby indicating a standing up ant. The initial orientation is designed to make it face forward as well. ### Episode End The ant is said to be unhealthy if any of the following happens:
Any of the state space values is no longer finite
The z-coordinate of the torso is not in the closed interval given by healthy_z_range (defaults to [0.2, 1.0]) If terminate_when_unhealthy=True is passed during construction (which is the default), the episode ends when any of the following happens:
Truncation: The episode duration reaches a 1000 timesteps
Termination: The ant is unhealthy If terminate_when_unhealthy=False is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower. env = gym.make('Ant-v2') v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. env = gym.make('Ant-v4', ctrl_cost_weight=0.1, ...) | Parameter | Type | Default |Description | |————————-|————|————–|——————————-| | xml_file | str | "ant.xml" | Path to a MuJoCo model | | ctrl_cost_weight | float | 0.5 | Weight for ctrl_cost term (see section on reward) | | contact_cost_weight | float | 5e-4 | Weight for contact_cost term (see section on reward) | | healthy_reward | float | 1 | Constant reward given if the ant is “healthy” after timestep | | terminate_when_unhealthy | bool| True | If true, issue a done signal if the z-coordinate of the torso is no longer in the healthy_z_range | | healthy_z_range | tuple | (0.2, 1) | The ant is considered healthy if the z-coordinate of the torso is in this range | | contact_force_range | tuple | (-1, 1) | Contact forces are clipped to this range in the computation of contact_cost | | reset_noise_scale | float | 0.1 | Scale of random perturbations of initial position and velocity (see section on Starting State) | | exclude_current_positions_from_observation| bool | True| Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class Ant : MuJoCoEnv

extension Ant: Env


                    
                    
                    DynamicGraph

A dynamic graph is a workspace for computations. All tensor variables can be tracked from a dynamic graph.

Declaration

Swift

public final class DynamicGraph


                    
                    
                    Model

A model is a base class for stateful operations on a dynamic graph. It can be use to construct computations statically, thus, more efficient.

Declaration

Swift

public class Model : AnyModel


                    
                    
                    ModelBuilder

A model builder is a more generic type of model. A model can be quite static, thus, you have to be quite careful to have a model work with dynamic inputs. You cannot use reshape, or anything that can generate fixed tensor outputs from a fixed inputs.

A model builder on the other hand doesn’t have that restriction. When input changes, it simply calls the given builder closure to construct a new model. In such way, you can continue to use reshape etc to assume fixed inputs and outputs, it will just work for dynamic inputs. The newly built model will carry over stateful components (parameters) from the old models, thus, it doesn’t reset your training. This also means you need to make sure parameter shape won’t change when input changes, otherwise we will fatal.

Declaration

Swift

public final class ModelBuilder<T> : AnyModelBuilder


                    
                    
                    HalfCheetah

Description

This environment is based on the work by P. Wawrzyński in “A Cat-Like Robot Real-Time Learning to Run”. The HalfCheetah is a 2-dimensional robot consisting of 9 links and 8 joints connecting them (including two paws). The goal is to apply a torque on the joints to make the cheetah run forward (right) as fast as possible, with a positive reward allocated based on the distance moved forward and a negative reward allocated for moving backward. The torso and head of the cheetah are fixed, and the torque can only be applied on the other 6 joints over the front and back thighs (connecting to the torso), shins (connecting to the thighs) and feet (connecting to the shins).

Action Space

The action space is a Box(-1, 1, (6,), float32). An action represents the torques applied between links. | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | | — | ————————————— | ———– | ———– | ——————————– | —– | ———— | | 0 | Torque applied on the back thigh rotor | -1 | 1 | bthigh | hinge | torque (N m) | | 1 | Torque applied on the back shin rotor | -1 | 1 | bshin | hinge | torque (N m) | | 2 | Torque applied on the back foot rotor | -1 | 1 | bfoot | hinge | torque (N m) | | 3 | Torque applied on the front thigh rotor | -1 | 1 | fthigh | hinge | torque (N m) | | 4 | Torque applied on the front shin rotor | -1 | 1 | fshin | hinge | torque (N m) | | 5 | Torque applied on the front foot rotor | -1 | 1 | ffoot | hinge | torque (N m) |

Observation Space

Observations consist of positional values of different body parts of the cheetah, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x-coordinate of the cheetah’s center of mass. It may be included by passing exclude_current_positions_from_observation=False during construction. In that case, the observation space will have 18 dimensions where the first dimension represents the x-coordinate of the cheetah’s center of mass. Regardless of whether exclude_current_positions_from_observation was set to true or false, the x-coordinate will be returned in info with key "x_position". However, by default, the observation is a ndarray with shape (17,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | ———————————— | —- | — | ——————————– | —– | ———————— | | 0 | z-coordinate of the front tip | -Inf | Inf | rootz | slide | position (m) | | 1 | angle of the front tip | -Inf | Inf | rooty | hinge | angle (rad) | | 2 | angle of the second rotor | -Inf | Inf | bthigh | hinge | angle (rad) | | 3 | angle of the second rotor | -Inf | Inf | bshin | hinge | angle (rad) | | 4 | velocity of the tip along the x-axis | -Inf | Inf | bfoot | hinge | angle (rad) | | 5 | velocity of the tip along the y-axis | -Inf | Inf | fthigh | hinge | angle (rad) | | 6 | angular velocity of front tip | -Inf | Inf | fshin | hinge | angle (rad) | | 7 | angular velocity of second rotor | -Inf | Inf | ffoot | hinge | angle (rad) | | 8 | x-coordinate of the front tip | -Inf | Inf | rootx | slide | velocity (m/s) | | 9 | y-coordinate of the front tip | -Inf | Inf | rootz | slide | velocity (m/s) | | 10 | angle of the front tip | -Inf | Inf | rooty | hinge | angular velocity (rad/s) | | 11 | angle of the second rotor | -Inf | Inf | bthigh | hinge | angular velocity (rad/s) | | 12 | angle of the second rotor | -Inf | Inf | bshin | hinge | angular velocity (rad/s) | | 13 | velocity of the tip along the x-axis | -Inf | Inf | bfoot | hinge | angular velocity (rad/s) | | 14 | velocity of the tip along the y-axis | -Inf | Inf | fthigh | hinge | angular velocity (rad/s) | | 15 | angular velocity of front tip | -Inf | Inf | fshin | hinge | angular velocity (rad/s) | | 16 | angular velocity of second rotor | -Inf | Inf | ffoot | hinge | angular velocity (rad/s) |

Rewards

The reward consists of two parts:

forward_reward: A reward of moving forward which is measured as forward_reward_weight * (x-coordinate before action - x-coordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (fixed to 5), where the frametime is 0.01 - making the default dt = 5 * 0.01 = 0.05. This reward would be positive if the cheetah runs forward (right).
ctrl_cost: A cost for penalising the cheetah if it takes actions that are too large. It is measured as ctrl_cost_weight * sum(action²) where ctrl_cost_weight is a parameter set for the control and has a default value of 0.1 The total reward returned is reward = forward_reward - ctrl_cost and info will also contain the individual reward terms ### Starting State All observations start in state (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,) with a noise added to the initial state for stochasticity. As seen before, the first 8 values in the state are positional and the last 9 values are velocity. A uniform noise in the range of [-reset_noise_scale, reset_noise_scale] is added to the positional values while a standard normal noise with a mean of 0 and standard deviation of reset_noise_scale is added to the initial velocity values of all zeros. ### Episode End The episode truncates when the episode length is greater than 1000. ### Arguments No additional arguments are currently supported in v2 and lower. env = gym.make('HalfCheetah-v2') v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. env = gym.make('HalfCheetah-v4', ctrl_cost_weight=0.1, ....) | Parameter | Type | Default | Description | | ——————————————– | ——— | ——————– | —————————————————————————————————————————————————————– | | xml_file | str | "half_cheetah.xml" | Path to a MuJoCo model | | forward_reward_weight | float | 1.0 | Weight for forward_reward term (see section on reward) | | ctrl_cost_weight | float | 0.1 | Weight for ctrl_cost weight (see section on reward) | | reset_noise_scale | float | 0.1 | Scale of random perturbations of initial position and velocity (see section on Starting State) | | exclude_current_positions_from_observation | bool | True | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class HalfCheetah : MuJoCoEnv

extension HalfCheetah: Env


                    
                    
                    Hopper

Description

This environment is based on the work done by Erez, Tassa, and Todorov in “Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks”. The environment aims to increase the number of independent state and control variables as compared to the classic control environments. The hopper is a two-dimensional one-legged figure that consist of four main body parts - the torso at the top, the thigh in the middle, the leg in the bottom, and a single foot on which the entire body rests. The goal is to make hops that move in the forward (right) direction by applying torques on the three hinges connecting the four body parts.

Action Space

The action space is a Box(-1, 1, (3,), float32). An action represents the torques applied between links | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | |—–|————————————|————-|————-|———————————-|——-|————–| | 0 | Torque applied on the thigh rotor | -1 | 1 | thigh_joint | hinge | torque (N m) | | 1 | Torque applied on the leg rotor | -1 | 1 | leg_joint | hinge | torque (N m) | | 3 | Torque applied on the foot rotor | -1 | 1 | foot_joint | hinge | torque (N m) |

Observation Space

Observations consist of positional values of different body parts of the hopper, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x-coordinate of the hopper. It may be included by passing exclude_current_positions_from_observation=False during construction. In that case, the observation space will have 12 dimensions where the first dimension represents the x-coordinate of the hopper. Regardless of whether exclude_current_positions_from_observation was set to true or false, the x-coordinate will be returned in info with key "x_position". However, by default, the observation is a ndarray with shape (11,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | ———————————————— | —- | — | ——————————– | —– | ———————— | | 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz | slide | position (m) | | 1 | angle of the top | -Inf | Inf | rooty | hinge | angle (rad) | | 2 | angle of the thigh joint | -Inf | Inf | thigh_joint | hinge | angle (rad) | | 3 | angle of the leg joint | -Inf | Inf | leg_joint | hinge | angle (rad) | | 4 | angle of the foot joint | -Inf | Inf | foot_joint | hinge | angle (rad) | | 5 | velocity of the x-coordinate of the top | -Inf | Inf | rootx | slide | velocity (m/s) | | 6 | velocity of the z-coordinate (height) of the top | -Inf | Inf | rootz | slide | velocity (m/s) | | 7 | angular velocity of the angle of the top | -Inf | Inf | rooty | hinge | angular velocity (rad/s) | | 8 | angular velocity of the thigh hinge | -Inf | Inf | thigh_joint | hinge | angular velocity (rad/s) | | 9 | angular velocity of the leg hinge | -Inf | Inf | leg_joint | hinge | angular velocity (rad/s) | | 10 | angular velocity of the foot hinge | -Inf | Inf | foot_joint | hinge | angular velocity (rad/s) |

Rewards

The reward consists of three parts:

healthy_reward: Every timestep that the hopper is healthy (see definition in section “Episode Termination”), it gets a reward of fixed value healthy_reward.
forward_reward: A reward of hopping forward which is measured as forward_reward_weight * (x-coordinate before action - x-coordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (fixed to 4), where the frametime is 0.002 - making the default dt = 4 * 0.002 = 0.008. This reward would be positive if the hopper hops forward (positive x direction).
ctrl_cost: A cost for penalising the hopper if it takes actions that are too large. It is measured as ctrl_cost_weight * sum(action²) where ctrl_cost_weight is a parameter set for the control and has a default value of 0.001 The total reward returned is reward = healthy_reward + forward_reward - ctrl_cost and info will also contain the individual reward terms ### Starting State All observations start in state (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [-reset_noise_scale, reset_noise_scale] added to the values for stochasticity. ### Episode End The hopper is said to be unhealthy if any of the following happens:
An element of observation[1:] (if exclude_current_positions_from_observation=True, else observation[2:]) is no longer contained in the closed interval specified by the argument healthy_state_range
The height of the hopper (observation[0] if exclude_current_positions_from_observation=True, else observation[1]) is no longer contained in the closed interval specified by the argument healthy_z_range (usually meaning that it has fallen)
The angle (observation[1] if exclude_current_positions_from_observation=True, else observation[2]) is no longer contained in the closed interval specified by the argument healthy_angle_range If terminate_when_unhealthy=True is passed during construction (which is the default), the episode ends when any of the following happens:
Truncation: The episode duration reaches a 1000 timesteps
Termination: The hopper is unhealthy If terminate_when_unhealthy=False is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower. env = gym.make('Hopper-v2') v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. env = gym.make('Hopper-v4', ctrl_cost_weight=0.1, ....) | Parameter | Type | Default | Description | | ——————————————– | ——— | ——————— | ——————————————————————————————————————————————————————————- | | xml_file | str | "hopper.xml" | Path to a MuJoCo model | | forward_reward_weight | float | 1.0 | Weight for forward_reward term (see section on reward) | | ctrl_cost_weight | float | 0.001 | Weight for ctrl_cost reward (see section on reward) | | healthy_reward | float | 1 | Constant reward given if the ant is “healthy” after timestep | | terminate_when_unhealthy | bool | True | If true, issue a done signal if the hopper is no longer healthy | | healthy_state_range | tuple | (-100, 100) | The elements of observation[1:] (if exclude_current_positions_from_observation=True, else observation[2:]) must be in this range for the hopper to be considered healthy | | healthy_z_range | tuple | (0.7, float("inf")) | The z-coordinate must be in this range for the hopper to be considered healthy | | healthy_angle_range | tuple | (-0.2, 0.2) | The angle given by observation[1] (if exclude_current_positions_from_observation=True, else observation[2]) must be in this range for the hopper to be considered healthy | | reset_noise_scale | float | 5e-3 | Scale of random perturbations of initial position and velocity (see section on Starting State) | | exclude_current_positions_from_observation | bool | True | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class Hopper : MuJoCoEnv

extension Hopper: Env


                    
                    
                    HTTPRenderServer

Declaration

Swift

public class HTTPRenderServer


                    
                    
                    Humanoid

Description

This environment is based on the environment introduced by Tassa, Erez and Todorov in “Synthesis and stabilization of complex behaviors through online trajectory optimization”. The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a pair of legs and arms. The legs each consist of two links, and so the arms (representing the knees and elbows respectively). The goal of the environment is to walk forward as fast as possible without falling over.

Action Space

The action space is a Box(-1, 1, (17,), float32). An action represents the torques applied at the hinge joints. | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | |—–|———————-|—————|—————-|—————————————|——-|——| | 0 | Torque applied on the hinge in the y-coordinate of the abdomen | -0.4 | 0.4 | hip_1 (front_left_leg) | hinge | torque (N m) | | 1 | Torque applied on the hinge in the z-coordinate of the abdomen | -0.4 | 0.4 | angle_1 (front_left_leg) | hinge | torque (N m) | | 2 | Torque applied on the hinge in the x-coordinate of the abdomen | -0.4 | 0.4 | hip_2 (front_right_leg) | hinge | torque (N m) | | 3 | Torque applied on the rotor between torso/abdomen and the right hip (x-coordinate) | -0.4 | 0.4 | right_hip_x (right_thigh) | hinge | torque (N m) | | 4 | Torque applied on the rotor between torso/abdomen and the right hip (z-coordinate) | -0.4 | 0.4 | right_hip_z (right_thigh) | hinge | torque (N m) | | 5 | Torque applied on the rotor between torso/abdomen and the right hip (y-coordinate) | -0.4 | 0.4 | right_hip_y (right_thigh) | hinge | torque (N m) | | 6 | Torque applied on the rotor between the right hip/thigh and the right shin | -0.4 | 0.4 | right_knee | hinge | torque (N m) | | 7 | Torque applied on the rotor between torso/abdomen and the left hip (x-coordinate) | -0.4 | 0.4 | left_hip_x (left_thigh) | hinge | torque (N m) | | 8 | Torque applied on the rotor between torso/abdomen and the left hip (z-coordinate) | -0.4 | 0.4 | left_hip_z (left_thigh) | hinge | torque (N m) | | 9 | Torque applied on the rotor between torso/abdomen and the left hip (y-coordinate) | -0.4 | 0.4 | left_hip_y (left_thigh) | hinge | torque (N m) | | 10 | Torque applied on the rotor between the left hip/thigh and the left shin | -0.4 | 0.4 | left_knee | hinge | torque (N m) | | 11 | Torque applied on the rotor between the torso and right upper arm (coordinate -1) | -0.4 | 0.4 | right_shoulder1 | hinge | torque (N m) | | 12 | Torque applied on the rotor between the torso and right upper arm (coordinate -2) | -0.4 | 0.4 | right_shoulder2 | hinge | torque (N m) | | 13 | Torque applied on the rotor between the right upper arm and right lower arm | -0.4 | 0.4 | right_elbow | hinge | torque (N m) | | 14 | Torque applied on the rotor between the torso and left upper arm (coordinate -1) | -0.4 | 0.4 | left_shoulder1 | hinge | torque (N m) | | 15 | Torque applied on the rotor between the torso and left upper arm (coordinate -2) | -0.4 | 0.4 | left_shoulder2 | hinge | torque (N m) | | 16 | Torque applied on the rotor between the left upper arm and left lower arm | -0.4 | 0.4 | left_elbow | hinge | torque (N m) |

Observation Space

Observations consist of positional values of different body parts of the Humanoid, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x- and y-coordinates of the torso. These may be included by passing exclude_current_positions_from_observation=False during construction. In that case, the observation space will have 378 dimensions where the first two dimensions represent the x- and y-coordinates of the torso. Regardless of whether exclude_current_positions_from_observation was set to true or false, the x- and y-coordinates will be returned in info with keys "x_position" and "y_position", respectively. However, by default, the observation is a ndarray with shape (376,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | ————————————————————————————————————— | —- | — | ——————————– | —– | ————————– | | 0 | z-coordinate of the torso (centre) | -Inf | Inf | root | free | position (m) | | 1 | x-orientation of the torso (centre) | -Inf | Inf | root | free | angle (rad) | | 2 | y-orientation of the torso (centre) | -Inf | Inf | root | free | angle (rad) | | 3 | z-orientation of the torso (centre) | -Inf | Inf | root | free | angle (rad) | | 4 | w-orientation of the torso (centre) | -Inf | Inf | root | free | angle (rad) | | 5 | z-angle of the abdomen (in lower_waist) | -Inf | Inf | abdomen_z | hinge | angle (rad) | | 6 | y-angle of the abdomen (in lower_waist) | -Inf | Inf | abdomen_y | hinge | angle (rad) | | 7 | x-angle of the abdomen (in pelvis) | -Inf | Inf | abdomen_x | hinge | angle (rad) | | 8 | x-coordinate of angle between pelvis and right hip (in right_thigh) | -Inf | Inf | right_hip_x | hinge | angle (rad) | | 9 | z-coordinate of angle between pelvis and right hip (in right_thigh) | -Inf | Inf | right_hip_z | hinge | angle (rad) | | 19 | y-coordinate of angle between pelvis and right hip (in right_thigh) | -Inf | Inf | right_hip_y | hinge | angle (rad) | | 11 | angle between right hip and the right shin (in right_knee) | -Inf | Inf | right_knee | hinge | angle (rad) | | 12 | x-coordinate of angle between pelvis and left hip (in left_thigh) | -Inf | Inf | left_hip_x | hinge | angle (rad) | | 13 | z-coordinate of angle between pelvis and left hip (in left_thigh) | -Inf | Inf | left_hip_z | hinge | angle (rad) | | 14 | y-coordinate of angle between pelvis and left hip (in left_thigh) | -Inf | Inf | left_hip_y | hinge | angle (rad) | | 15 | angle between left hip and the left shin (in left_knee) | -Inf | Inf | left_knee | hinge | angle (rad) | | 16 | coordinate-1 (multi-axis) angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder1 | hinge | angle (rad) | | 17 | coordinate-2 (multi-axis) angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder2 | hinge | angle (rad) | | 18 | angle between right upper arm and right_lower_arm | -Inf | Inf | right_elbow | hinge | angle (rad) | | 19 | coordinate-1 (multi-axis) angle between torso and left arm (in left_upper_arm) | -Inf | Inf | left_shoulder1 | hinge | angle (rad) | | 20 | coordinate-2 (multi-axis) angle between torso and left arm (in left_upper_arm) | -Inf | Inf | left_shoulder2 | hinge | angle (rad) | | 21 | angle between left upper arm and left_lower_arm | -Inf | Inf | left_elbow | hinge | angle (rad) | | 22 | x-coordinate velocity of the torso (centre) | -Inf | Inf | root | free | velocity (m/s) | | 23 | y-coordinate velocity of the torso (centre) | -Inf | Inf | root | free | velocity (m/s) | | 24 | z-coordinate velocity of the torso (centre) | -Inf | Inf | root | free | velocity (m/s) | | 25 | x-coordinate angular velocity of the torso (centre) | -Inf | Inf | root | free | anglular velocity (rad/s) | | 26 | y-coordinate angular velocity of the torso (centre) | -Inf | Inf | root | free | anglular velocity (rad/s) | | 27 | z-coordinate angular velocity of the torso (centre) | -Inf | Inf | root | free | anglular velocity (rad/s) | | 28 | z-coordinate of angular velocity of the abdomen (in lower_waist) | -Inf | Inf | abdomen_z | hinge | anglular velocity (rad/s) | | 29 | y-coordinate of angular velocity of the abdomen (in lower_waist) | -Inf | Inf | abdomen_y | hinge | anglular velocity (rad/s) | | 30 | x-coordinate of angular velocity of the abdomen (in pelvis) | -Inf | Inf | abdomen_x | hinge | aanglular velocity (rad/s) | | 31 | x-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh) | -Inf | Inf | right_hip_x | hinge | anglular velocity (rad/s) | | 32 | z-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh) | -Inf | Inf | right_hip_z | hinge | anglular velocity (rad/s) | | 33 | y-coordinate of the angular velocity of the angle between pelvis and right hip (in right_thigh) | -Inf | Inf | right_hip_y | hinge | anglular velocity (rad/s) | | 34 | angular velocity of the angle between right hip and the right shin (in right_knee) | -Inf | Inf | right_knee | hinge | anglular velocity (rad/s) | | 35 | x-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh) | -Inf | Inf | left_hip_x | hinge | anglular velocity (rad/s) | | 36 | z-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh) | -Inf | Inf | left_hip_z | hinge | anglular velocity (rad/s) | | 37 | y-coordinate of the angular velocity of the angle between pelvis and left hip (in left_thigh) | -Inf | Inf | left_hip_y | hinge | anglular velocity (rad/s) | | 38 | angular velocity of the angle between left hip and the left shin (in left_knee) | -Inf | Inf | left_knee | hinge | anglular velocity (rad/s) | | 39 | coordinate-1 (multi-axis) of the angular velocity of the angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder1 | hinge | anglular velocity (rad/s) | | 40 | coordinate-2 (multi-axis) of the angular velocity of the angle between torso and right arm (in right_upper_arm) | -Inf | Inf | right_shoulder2 | hinge | anglular velocity (rad/s) | | 41 | angular velocity of the angle between right upper arm and right_lower_arm | -Inf | Inf | right_elbow | hinge | anglular velocity (rad/s) | | 42 | coordinate-1 (multi-axis) of the angular velocity of the angle between torso and left arm (in left_upper_arm) | -Inf | Inf | left_shoulder1 | hinge | anglular velocity (rad/s) | | 43 | coordinate-2 (multi-axis) of the angular velocity of the angle between torso and left arm (in left_upper_arm) | -Inf | Inf | left_shoulder2 | hinge | anglular velocity (rad/s) | | 44 | angular velocitty of the angle between left upper arm and left_lower_arm | -Inf | Inf | left_elbow | hinge | anglular velocity (rad/s) | Additionally, after all the positional and velocity based values in the table, the observation contains (in order):

cinert: Mass and inertia of a single rigid body relative to the center of mass (this is an intermediate result of transition). It has shape 14*10 (nbody * 10) and hence adds to another 140 elements in the state space.
cvel: Center of mass based velocity. It has shape 14 * 6 (nbody * 6) and hence adds another 84 elements in the state space
qfrc_actuator: Constraint force generated as the actuator force. This has shape (23,) (nv * 1) and hence adds another 23 elements to the state space.
cfrc_ext: This is the center of mass based external force on the body. It has shape 14 * 6 (nbody * 6) and hence adds to another 84 elements in the state space. where nbody stands for the number of bodies in the robot and nv stands for the number of degrees of freedom (= dim(qvel)) The (x,y,z) coordinates are translational DOFs while the orientations are rotational DOFs expressed as quaternions. One can read more about free joints on the Mujoco Documentation. Note: Humanoid-v4 environment no longer has the following contact forces issue. If using previous Humanoid versions from v4, there have been reported issues that using a Mujoco-Py version > 2.0 results in the contact forces always being 0. As such we recommend to use a Mujoco-Py version < 2.0 when using the Humanoid environment if you would like to report results with contact forces (if contact forces are not used in your experiments, you can use version > 2.0). ### Rewards The reward consists of three parts:
healthy_reward: Every timestep that the humanoid is alive (see section Episode Termination for definition), it gets a reward of fixed value healthy_reward
forward_reward: A reward of walking forward which is measured as forward_reward_weight * (average center of mass before action - average center of mass after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (default is 5), where the frametime is 0.003 - making the default dt = 5 * 0.003 = 0.015. This reward would be positive if the humanoid walks forward (in positive x-direction). The calculation for the center of mass is defined in the .py file for the Humanoid.
ctrl_cost: A negative reward for penalising the humanoid if it has too large of a control force. If there are nu actuators/controls, then the control has shape nu x 1. It is measured as ctrl_cost_weight * sum(control²).
contact_cost: A negative reward for penalising the humanoid if the external contact force is too large. It is calculated by clipping contact_cost_weight * sum(external contact force²) to the interval specified by contact_cost_range. The total reward returned is reward = healthy_reward + forward_reward - ctrl_cost - contact_cost and info will also contain the individual reward terms ### Starting State All observations start in state (0.0, 0.0, 1.4, 1.0, 0.0 … 0.0) with a uniform noise in the range of [-reset_noise_scale, reset_noise_scale] added to the positional and velocity values (values in the table) for stochasticity. Note that the initial z coordinate is intentionally selected to be high, thereby indicating a standing up humanoid. The initial orientation is designed to make it face forward as well. ### Episode End The humanoid is said to be unhealthy if the z-position of the torso is no longer contained in the closed interval specified by the argument healthy_z_range. If terminate_when_unhealthy=True is passed during construction (which is the default), the episode ends when any of the following happens:
Truncation: The episode duration reaches a 1000 timesteps
Termination: The humanoid is unhealthy If terminate_when_unhealthy=False is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower. env = gym.make('Humanoid-v4') v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. env = gym.make('Humanoid-v4', ctrl_cost_weight=0.1, ....) | Parameter | Type | Default | Description | | ——————————————– | ——— | —————- | ————————————————————————————————————————————————————————- | | xml_file | str | "humanoid.xml" | Path to a MuJoCo model | | forward_reward_weight | float | 1.25 | Weight for forward_reward term (see section on reward) | | ctrl_cost_weight | float | 0.1 | Weight for ctrl_cost term (see section on reward) | | contact_cost_weight | float | 5e-7 | Weight for contact_cost term (see section on reward) | | healthy_reward | float | 5.0 | Constant reward given if the humanoid is “healthy” after timestep | | terminate_when_unhealthy | bool | True | If true, issue a done signal if the z-coordinate of the torso is no longer in the healthy_z_range | | healthy_z_range | tuple | (1.0, 2.0) | The humanoid is considered healthy if the z-coordinate of the torso is in this range | | reset_noise_scale | float | 1e-2 | Scale of random perturbations of initial position and velocity (see section on Starting State) | | exclude_current_positions_from_observation | bool | True | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class Humanoid : MuJoCoEnv

extension Humanoid: Env


                    
                    
                    InvertedDoublePendulum

Description

This environment originates from control theory and builds on the cartpole environment based on the work done by Barto, Sutton, and Anderson in “Neuronlike adaptive elements that can solve difficult learning control problems”, powered by the Mujoco physics simulator - allowing for more complex experiments (such as varying the effects of gravity or constraints). This environment involves a cart that can moved linearly, with a pole fixed on it and a second pole fixed on the other end of the first one (leaving the second pole as the only one with one free end). The cart can be pushed left or right, and the goal is to balance the second pole on top of the first pole, which is in turn on top of the cart, by applying continuous forces on the cart.

Action Space

The agent take a 1-element vector for actions. The action space is a continuous (action) in [-1, 1], where action represents the numerical force applied to the cart (with magnitude representing the amount of force and sign representing the direction) | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | |—–|—————————|————-|————-|———————————-|——-|———–| | 0 | Force applied on the cart | -1 | 1 | slider | slide | Force (N) |

Observation Space

The state space consists of positional values of different body parts of the pendulum system, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. The observation is a ndarray with shape (11,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | —————————————————————– | —- | — | ——————————– | —– | ———————— | | 0 | position of the cart along the linear surface | -Inf | Inf | slider | slide | position (m) | | 1 | sine of the angle between the cart and the first pole | -Inf | Inf | sin(hinge) | hinge | unitless | | 2 | sine of the angle between the two poles | -Inf | Inf | sin(hinge2) | hinge | unitless | | 3 | cosine of the angle between the cart and the first pole | -Inf | Inf | cos(hinge) | hinge | unitless | | 4 | cosine of the angle between the two poles | -Inf | Inf | cos(hinge2) | hinge | unitless | | 5 | velocity of the cart | -Inf | Inf | slider | slide | velocity (m/s) | | 6 | angular velocity of the angle between the cart and the first pole | -Inf | Inf | hinge | hinge | angular velocity (rad/s) | | 7 | angular velocity of the angle between the two poles | -Inf | Inf | hinge2 | hinge | angular velocity (rad/s) | | 8 | constraint force - 1 | -Inf | Inf | | | Force (N) | | 9 | constraint force - 2 | -Inf | Inf | | | Force (N) | | 10 | constraint force - 3 | -Inf | Inf | | | Force (N) | There is physical contact between the robots and their environment - and Mujoco attempts at getting realisitic physics simulations for the possible physical contact dynamics by aiming for physical accuracy and computational efficiency. There is one constraint force for contacts for each degree of freedom (3). The approach and handling of constraints by Mujoco is unique to the simulator and is based on their research. Once can find more information in their documentation or in their paper “Analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo”.

Rewards

The reward consists of two parts:

alive_bonus: The goal is to make the second inverted pendulum stand upright (within a certain angle limit) as long as possible - as such a reward of +10 is awarded for each timestep that the second pole is upright.
distance_penalty: This reward is a measure of how far the tip of the second pendulum (the only free end) moves, and it is calculated as 0.01 * x² + (y - 2)², where x is the x-coordinate of the tip and y is the y-coordinate of the tip of the second pole.
velocity_penalty: A negative reward for penalising the agent if it moves too fast 0.001 * v₁² + 0.005 * v₂ ² The total reward returned is reward = alive_bonus - distance_penalty - velocity_penalty ### Starting State All observations start in state (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [-0.1, 0.1] added to the positional values (cart position and pole angles) and standard normal force with a standard deviation of 0.1 added to the velocity values for stochasticity. ### Episode End The episode ends when any of the following happens: 1.Truncation: The episode duration reaches 1000 timesteps. 2.Termination: Any of the state space values is no longer finite. 3.Termination: The y_coordinate of the tip of the second pole is less than or equal to 1. The maximum standing height of the system is 1.196 m when all the parts are perpendicularly vertical on top of each other). ### Arguments No additional arguments are currently supported. env = gym.make('InvertedDoublePendulum-v4') There is no v3 for InvertedPendulum, unlike the robot environments where a v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class InvertedDoublePendulum : MuJoCoEnv

extension InvertedDoublePendulum: Env


                    
                    
                    InvertedPendulum

Description

This environment is the cartpole environment based on the work done by Barto, Sutton, and Anderson in “Neuronlike adaptive elements that can solve difficult learning control problems”, just like in the classic environments but now powered by the Mujoco physics simulator - allowing for more complex experiments (such as varying the effects of gravity). This environment involves a cart that can moved linearly, with a pole fixed on it at one end and having another end free. The cart can be pushed left or right, and the goal is to balance the pole on the top of the cart by applying forces on the cart.

Action Space

The agent take a 1-element vector for actions. The action space is a continuous (action) in [-3, 3], where action represents the numerical force applied to the cart (with magnitude representing the amount of force and sign representing the direction) | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | |—–|—————————|————-|————-|———————————-|——-|———–| | 0 | Force applied on the cart | -3 | 3 | slider | slide | Force (N) |

Observation Space

The state space consists of positional values of different body parts of the pendulum system, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. The observation is a ndarray with shape (4,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | ——————————————— | —- | — | ——————————– | —– | ————————- | | 0 | position of the cart along the linear surface | -Inf | Inf | slider | slide | position (m) | | 1 | vertical angle of the pole on the cart | -Inf | Inf | hinge | hinge | angle (rad) | | 2 | linear velocity of the cart | -Inf | Inf | slider | slide | velocity (m/s) | | 3 | angular velocity of the pole on the cart | -Inf | Inf | hinge | hinge | anglular velocity (rad/s) |

Rewards

The goal is to make the inverted pendulum stand upright (within a certain angle limit) as long as possible - as such a reward of +1 is awarded for each timestep that the pole is upright.

Starting State

All observations start in state (0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [-0.01, 0.01] added to the values for stochasticity.

Episode End

The episode ends when any of the following happens:

Truncation: The episode duration reaches 1000 timesteps.
Termination: Any of the state space values is no longer finite.
Termination: The absolutely value of the vertical angle between the pole and the cart is greater than 0.2 radian. ### Arguments No additional arguments are currently supported. env = gym.make('InvertedPendulum-v4') There is no v3 for InvertedPendulum, unlike the robot environments where a v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class InvertedPendulum : MuJoCoEnv

extension InvertedPendulum: Env

Sum

Sum inputs.

Declaration

Swift

public final class Sum : Model

Add

Add two inputs together. It will do broadcast if needed.

Declaration

Swift

public final class Add : Model

Mul

Multiply two inputs together. It will do broadcast if needed.

Declaration

Swift

public final class Mul : Model

Div

Div two inputs together. It will not do broadcast.

Declaration

Swift

public final class Div : Model


                    
                    
                    SquareRoot

Square root of a input. It will not do broadcast.

Declaration

Swift

public final class SquareRoot : Model


                    
                    
                    Matmul

Matrix-multiplication over two inputs.

Declaration

Swift

public final class Matmul : Model


                    
                    
                    Cmul

Comlex number multiplication over two inputs.

Declaration

Swift

public final class Cmul : Model


                    
                    
                    Dense

A linear layer model.

Declaration

Swift

public final class Dense : Model


                    
                    
                    Reshape

A reshape model.

Declaration

Swift

public final class Reshape : Model

Pad

A pad model.

Declaration

Swift

public final class Pad : Model


                    
                    
                    Identity

A identity model.

Declaration

Swift

public final class Identity : Model


                    
                    
                    Permute

A permute model.

Declaration

Swift

public final class Permute : Model


                    
                    
                    ReLU

A ReLU activation model.

Declaration

Swift

public final class ReLU : Model


                    
                    
                    LeakyReLU

A leaky ReLU activation model.

Declaration

Swift

public final class LeakyReLU : Model


                    
                    
                    Softmax

A softmax activation model.

Declaration

Swift

public final class Softmax : Model


                    
                    
                    Sigmoid

A sigmoid activation model.

Declaration

Swift

public final class Sigmoid : Model


                    
                    
                    Tanh

A tanh activation model.

Declaration

Swift

public final class Tanh : Model


                    
                    
                    Swish

A swish activation model.

Declaration

Swift

public final class Swish : Model


                    
                    
                    GELU

A GELU activation model.

Declaration

Swift

public final class GELU : Model


                    
                    
                    Transpose

Declaration

Swift

public final class Transpose : Model


                    
                    
                    MaskedFill

The masked fill model. If the value equal to a given constant, fill with another constant.

Declaration

Swift

public final class MaskedFill : Model


                    
                    
                    Dropout

The dropout model.

Declaration

Swift

public final class Dropout : Model


                    
                    
                    Scalmul

Multiply all values with a constant.

Declaration

Swift

public final class Scalmul : Model


                    
                    
                    BatchNorm

Batch normalization model.

Declaration

Swift

public final class BatchNorm : Model


                    
                    
                    LayerNorm

Layer normalization model.

Declaration

Swift

public final class LayerNorm : Model


                    
                    
                    GroupNorm

Group normalization model.

Declaration

Swift

public final class GroupNorm : Model


                    
                    
                    RMSNorm

RMSNorm model.

Declaration

Swift

public final class RMSNorm : Model


                    
                    
                    Flatten

Make the input tensor to be 1-D tensor (respecting N).

Declaration

Swift

public final class Flatten : Model


                    
                    
                    Convolution

Convolution model.

Declaration

Swift

public final class Convolution : Model


                    
                    
                    ConvolutionTranspose

Convolution Transpose model.

Declaration

Swift

public final class ConvolutionTranspose : Model


                    
                    
                    MaxPool

max pooling model.

Declaration

Swift

public final class MaxPool : Model


                    
                    
                    AveragePool

average pooling model.

Declaration

Swift

public final class AveragePool : Model


                    
                    
                    Upsample

upsample model.

Declaration

Swift

public final class Upsample : Model


                    
                    
                    ReduceSum

reduce sum model.

Declaration

Swift

public final class ReduceSum : Model


                    
                    
                    ReduceMean

reduce mean model.

Declaration

Swift

public final class ReduceMean : Model


                    
                    
                    ReduceMax

reduce max model.

Declaration

Swift

public final class ReduceMax : Model


                    
                    
                    ReduceMin

reduce min model.

Declaration

Swift

public final class ReduceMin : Model


                    
                    
                    ReduceNorm2

reduce norm2 model.

Declaration

Swift

public final class ReduceNorm2 : Model

Min

min model.

Declaration

Swift

public final class Min : Model

Max

max model.

Declaration

Swift

public final class Max : Model


                    
                    
                    Extract

Extract model.

Declaration

Swift

public final class Extract : Model


                    
                    
                    Argmax

Argmax model.

Declaration

Swift

public final class Argmax : Model


                    
                    
                    Argmin

Argmin model.

Declaration

Swift

public final class Argmin : Model


                    
                    
                    Concat

Concatenate model.

Declaration

Swift

public final class Concat : Model


                    
                    
                    Chunk

Chunk model.

Declaration

Swift

public final class Chunk : Model


                    
                    
                    LSTM

LSTM model.

Declaration

Swift

public final class LSTM : Model


                    
                    
                    Embedding

Embedding model.

Declaration

Swift

public final class Embedding : Model


                    
                    
                    IndexSelect

IndexSelect model.

Declaration

Swift

public final class IndexSelect : Model


                    
                    
                    DatatypeConversion

DatatypeConversion model.

Declaration

Swift

public final class DatatypeConversion : Model


                    
                    
                    Contiguous

“Making” contiguous model.

Declaration

Swift

public final class Contiguous : Model


                    
                    
                    Copy

“Making” copy model.

Declaration

Swift

public final class Copy : Model


                    
                    
                    Parameter

Parameter model.

Declaration

Swift

public final class Parameter<Element> : Model where Element : TensorNumeric

extension Parameter: ModelIOConvertible


                    
                    
                    Scalar

Scalar model.

Declaration

Swift

public final class Scalar : Model


                    
                    
                    Variable

Variable model.

Declaration

Swift

public final class Variable<Element> : Model where Element : TensorNumeric

extension Variable: ModelIOConvertible


                    
                    
                    ScaledDotProductAttention

Scaled-dot-product-attention model.

Declaration

Swift

public final class ScaledDotProductAttention : Model


                    
                    
                    Debug

Debug model.

Declaration

Swift

public final class Debug : Model


                    
                    
                    CustomModel

Custom model.

Declaration

Swift

public final class CustomModel : Model


                    
                    
                    AnyModelBuilder

A type-erased model builder.

Declaration

Swift

public class AnyModelBuilder : AnyModel


                    
                    
                    Input

Model Inputs for Functional Model

Declaration

Swift

public final class Input : Model.IO


                    
                    
                    MuJoCoVideo

Declaration

Swift

public final class MuJoCoVideo<EnvType> where EnvType : MuJoCoEnv

extension MuJoCoVideo: Renderable


                    
                    
                    MuJoCoViewer

Declaration

Swift

public final class MuJoCoViewer<EnvType> where EnvType : MuJoCoEnv

extension MuJoCoViewer: Renderable


                    
                    
                    StreamContext

A stream context is an object that an execution can be performed upon.

Declaration

Swift

public final class StreamContext


                    
                    
                    Swimmer

Description

This environment corresponds to the Swimmer environment described in Rémi Coulom’s PhD thesis “Reinforcement Learning Using Neural Networks, with Applications to Motor Control”. The environment aims to increase the number of independent state and control variables as compared to the classic control environments. The swimmers consist of three or more segments (‘links’) and one less articulation joints (‘rotors’) - one rotor joint connecting exactly two links to form a linear chain. The swimmer is suspended in a two dimensional pool and always starts in the same position (subject to some deviation drawn from an uniform distribution), and the goal is to move as fast as possible towards the right by applying torque on the rotors and using the fluids friction.

Notes

The problem parameters are: Problem parameters:

n: number of body parts
m_i: mass of part i (i ∈ {1…n})
l_i: length of part i (i ∈ {1…n})
k: viscous-friction coefficient While the default environment has n = 3, l_i = 0.1, and k = 0.1. It is possible to pass a custom MuJoCo XML file during construction to increase the number of links, or to tweak any of the parameters. ### Action Space The action space is a Box(-1, 1, (2,), float32). An action represents the torques applied between links | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | |—–|————————————|————-|————-|———————————-|——-|————–| | 0 | Torque applied on the first rotor | -1 | 1 | motor1_rot | hinge | torque (N m) | | 1 | Torque applied on the second rotor | -1 | 1 | motor2_rot | hinge | torque (N m) | ### Observation Space By default, observations consists of:
θ_i: angle of part i with respect to the x axis
θ_i‘: its derivative with respect to time (angular velocity) In the default case, observations do not include the x- and y-coordinates of the front tip. These may be included by passing exclude_current_positions_from_observation=False during construction. Then, the observation space will have 10 dimensions where the first two dimensions represent the x- and y-coordinates of the front tip. Regardless of whether exclude_current_positions_from_observation was set to true or false, the x- and y-coordinates will be returned in info with keys "x_position" and "y_position", respectively. By default, the observation is a ndarray with shape (8,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | ———————————— | —- | — | ——————————– | —– | ———————— | | 0 | angle of the front tip | -Inf | Inf | free_body_rot | hinge | angle (rad) | | 1 | angle of the first rotor | -Inf | Inf | motor1_rot | hinge | angle (rad) | | 2 | angle of the second rotor | -Inf | Inf | motor2_rot | hinge | angle (rad) | | 3 | velocity of the tip along the x-axis | -Inf | Inf | slider1 | slide | velocity (m/s) | | 4 | velocity of the tip along the y-axis | -Inf | Inf | slider2 | slide | velocity (m/s) | | 5 | angular velocity of front tip | -Inf | Inf | free_body_rot | hinge | angular velocity (rad/s) | | 6 | angular velocity of first rotor | -Inf | Inf | motor1_rot | hinge | angular velocity (rad/s) | | 7 | angular velocity of second rotor | -Inf | Inf | motor2_rot | hinge | angular velocity (rad/s) | ### Rewards The reward consists of two parts:
forward_reward: A reward of moving forward which is measured as forward_reward_weight * (x-coordinate before action - x-coordinate after action)/dt. dt is the time between actions and is dependent on the frame_skip parameter (default is 4), where the frametime is 0.01 - making the default dt = 4 * 0.01 = 0.04. This reward would be positive if the swimmer swims right as desired.
ctrl_cost: A cost for penalising the swimmer if it takes actions that are too large. It is measured as ctrl_cost_weight * sum(action²) where ctrl_cost_weight is a parameter set for the control and has a default value of 1e-4 The total reward returned is reward = forward_reward - ctrl_cost and info will also contain the individual reward terms ### Starting State All observations start in state (0,0,0,0,0,0,0,0) with a Uniform noise in the range of [-reset_noise_scale, reset_noise_scale] is added to the initial state for stochasticity. ### Episode End The episode truncates when the episode length is greater than 1000. ### Arguments No additional arguments are currently supported in v2 and lower. gym.make('Swimmer-v4') v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. env = gym.make('Swimmer-v4', ctrl_cost_weight=0.1, ....) | Parameter | Type | Default | Description | | ——————————————– | ——— | ————— | ————————————————————————————————————————————————————————- | | xml_file | str | "swimmer.xml" | Path to a MuJoCo model | | forward_reward_weight | float | 1.0 | Weight for forward_reward term (see section on reward) | | ctrl_cost_weight | float | 1e-4 | Weight for ctrl_cost term (see section on reward) | | reset_noise_scale | float | 0.1 | Scale of random perturbations of initial position and velocity (see section on Starting State) | | exclude_current_positions_from_observation | bool | True | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class Swimmer : MuJoCoEnv

extension Swimmer: Env


                    
                    
                    CmdParamsFactory

Declaration

Swift

public final class CmdParamsFactory


                    
                    
                    AnyTensorStorage

Declaration

Swift

public final class AnyTensorStorage


                    
                    
                    TimeLimit

Declaration

Swift

public final class TimeLimit<EnvType> where EnvType : Env

extension TimeLimit: Env where EnvType.TerminatedType == Bool

extension TimeLimit: MuJoCoEnv where EnvType: MuJoCoEnv


                    
                    
                    VecEnv

Declaration

Swift

public final class VecEnv<EnvType: Env, Element: TensorNumeric>
where EnvType.ActType == Tensor<Element>, EnvType.ObsType == Tensor<Element>

extension VecEnv: Env where EnvType.TerminatedType == Bool


                    
                    
                    Walker2D

Description

This environment builds on the hopper environment based on the work done by Erez, Tassa, and Todorov in “Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks” by adding another set of legs making it possible for the robot to walker forward instead of hop. Like other Mujoco environments, this environment aims to increase the number of independent state and control variables as compared to the classic control environments. The walker is a two-dimensional two-legged figure that consist of four main body parts - a single torso at the top (with the two legs splitting after the torso), two thighs in the middle below the torso, two legs in the bottom below the thighs, and two feet attached to the legs on which the entire body rests. The goal is to make coordinate both sets of feet, legs, and thighs to move in the forward (right) direction by applying torques on the six hinges connecting the six body parts.

Action Space

The action space is a Box(-1, 1, (6,), float32). An action represents the torques applied at the hinge joints. | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | |—–|—————————————-|————-|————-|———————————-|——-|————–| | 0 | Torque applied on the thigh rotor | -1 | 1 | thigh_joint | hinge | torque (N m) | | 1 | Torque applied on the leg rotor | -1 | 1 | leg_joint | hinge | torque (N m) | | 2 | Torque applied on the foot rotor | -1 | 1 | foot_joint | hinge | torque (N m) | | 3 | Torque applied on the left thigh rotor | -1 | 1 | thigh_left_joint | hinge | torque (N m) | | 4 | Torque applied on the left leg rotor | -1 | 1 | leg_left_joint | hinge | torque (N m) | | 5 | Torque applied on the left foot rotor | -1 | 1 | foot_left_joint | hinge | torque (N m) |

Observation Space

Observations consist of positional values of different body parts of the walker, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. By default, observations do not include the x-coordinate of the top. It may be included by passing exclude_current_positions_from_observation=False during construction. In that case, the observation space will have 18 dimensions where the first dimension represent the x-coordinates of the top of the walker. Regardless of whether exclude_current_positions_from_observation was set to true or false, the x-coordinate of the top will be returned in info with key "x_position". By default, observation is a ndarray with shape (17,) where the elements correspond to the following: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | — | ———————————————— | —- | — | ——————————– | —– | ———————— | | 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz (torso) | slide | position (m) | | 1 | angle of the top | -Inf | Inf | rooty (torso) | hinge | angle (rad) | | 2 | angle of the thigh joint | -Inf | Inf | thigh_joint | hinge | angle (rad) | | 3 | angle of the leg joint | -Inf | Inf | leg_joint | hinge | angle (rad) | | 4 | angle of the foot joint | -Inf | Inf | foot_joint | hinge | angle (rad) | | 5 | angle of the left thigh joint | -Inf | Inf | thigh_left_joint | hinge | angle (rad) | | 6 | angle of the left leg joint | -Inf | Inf | leg_left_joint | hinge | angle (rad) | | 7 | angle of the left foot joint | -Inf | Inf | foot_left_joint | hinge | angle (rad) | | 8 | velocity of the x-coordinate of the top | -Inf | Inf | rootx | slide | velocity (m/s) | | 9 | velocity of the z-coordinate (height) of the top | -Inf | Inf | rootz | slide | velocity (m/s) | | 10 | angular velocity of the angle of the top | -Inf | Inf | rooty | hinge | angular velocity (rad/s) | | 11 | angular velocity of the thigh hinge | -Inf | Inf | thigh_joint | hinge | angular velocity (rad/s) | | 12 | angular velocity of the leg hinge | -Inf | Inf | leg_joint | hinge | angular velocity (rad/s) | | 13 | angular velocity of the foot hinge | -Inf | Inf | foot_joint | hinge | angular velocity (rad/s) | | 14 | angular velocity of the thigh hinge | -Inf | Inf | thigh_left_joint | hinge | angular velocity (rad/s) | | 15 | angular velocity of the leg hinge | -Inf | Inf | leg_left_joint | hinge | angular velocity (rad/s) | | 16 | angular velocity of the foot hinge | -Inf | Inf | foot_left_joint | hinge | angular velocity (rad/s) |

Rewards

The reward consists of three parts:

healthy_reward: Every timestep that the walker is alive, it receives a fixed reward of value healthy_reward,
forward_reward: A reward of walking forward which is measured as forward_reward_weight * (x-coordinate before action - x-coordinate after action)/dt. dt is the time between actions and is dependeent on the frame_skip parameter (default is 4), where the frametime is 0.002 - making the default dt = 4 * 0.002 = 0.008. This reward would be positive if the walker walks forward (right) desired.
ctrl_cost: A cost for penalising the walker if it takes actions that are too large. It is measured as ctrl_cost_weight * sum(action²) where ctrl_cost_weight is a parameter set for the control and has a default value of 0.001 The total reward returned is reward = healthy_reward bonus + forward_reward - ctrl_cost and info will also contain the individual reward terms ### Starting State All observations start in state (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [-reset_noise_scale, reset_noise_scale] added to the values for stochasticity. ### Episode End The walker is said to be unhealthy if any of the following happens:
Any of the state space values is no longer finite
The height of the walker is not in the closed interval specified by healthy_z_range
The absolute value of the angle (observation[1] if exclude_current_positions_from_observation=False, else observation[2]) is not in the closed interval specified by healthy_angle_range If terminate_when_unhealthy=True is passed during construction (which is the default), the episode ends when any of the following happens:
Truncation: The episode duration reaches a 1000 timesteps
Termination: The walker is unhealthy If terminate_when_unhealthy=False is passed, the episode is ended only when 1000 timesteps are exceeded. ### Arguments No additional arguments are currently supported in v2 and lower. env = gym.make('Walker2d-v4') v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. env = gym.make('Walker2d-v4', ctrl_cost_weight=0.1, ....) | Parameter | Type | Default | Description | | ——————————————– | ——— | —————- | —————————————————————————————————————————————————————– | | xml_file | str | "walker2d.xml" | Path to a MuJoCo model | | forward_reward_weight | float | 1.0 | Weight for forward_reward term (see section on reward) | | ctrl_cost_weight | float | 1e-3 | Weight for ctr_cost term (see section on reward) | | healthy_reward | float | 1.0 | Constant reward given if the ant is “healthy” after timestep | | terminate_when_unhealthy | bool | True | If true, issue a done signal if the z-coordinate of the walker is no longer healthy | | healthy_z_range | tuple | (0.8, 2) | The z-coordinate of the top of the walker must be in this range to be considered healthy | | healthy_angle_range | tuple | (-1, 1) | The angle must be in this range to be considered healthy | | reset_noise_scale | float | 5e-3 | Scale of random perturbations of initial position and velocity (see section on Starting State) | | exclude_current_positions_from_observation | bool | True | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | ### Version History
v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
v2: All continuous control environments now use mujoco_py >= 1.50
v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
v0: Initial versions release (1.0.0)

Declaration

Swift

public final class Walker2D : MuJoCoEnv

extension Walker2D: Env