toddlerbot.policies package

Submodules

toddlerbot.policies.balance_pd module

PD control policy for robot balancing and basic manipulation tasks.

This module provides a balance control policy using proportional-derivative (PD) controllers for center of mass positioning, torso orientation control, and basic manipulation capabilities.

class toddlerbot.policies.balance_pd.BalancePDPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], joystick: Joystick | None = None, cameras: List[Camera] | None = None, zmq_receiver: ZMQNode | None = None, zmq_sender: ZMQNode | None = None, ip: str = '', fixed_command: ndarray[Any, dtype[float32]] | None = None, use_torso_pd: bool = True)

Bases: BasePolicy

Policy for balancing the robot using a PD controller.

com_fk(knee_angle, hip_pitch_angle=None, hip_roll_angle=None)

Calculates the center of mass (CoM) position of a leg segment based on joint angles.

Parameters:
  • knee_angle (float | ArrayType) – The angle of the knee joint in radians.

  • hip_pitch_angle (Optional[float | ArrayType]) – The angle of the hip pitch joint in radians. Defaults to None.

  • hip_roll_angle (Optional[float | ArrayType]) – The angle of the hip roll joint in radians. Defaults to None.

Returns:

A 3D vector representing the CoM position in Cartesian coordinates.

Return type:

ArrayType

com_ik(com_z, com_x=None, com_y=None)

Calculates the inverse kinematics for the center of mass (COM) of a bipedal robot leg.

This function computes the joint angles required to position the robot’s leg such that the center of mass is at the specified coordinates. It uses the lengths of the leg segments and default positions to determine the necessary joint angles.

Parameters:
  • com_z (float or ArrayType) – The z-coordinate of the center of mass.

  • com_x (Optional[float or ArrayType]) – The x-coordinate of the center of mass. Defaults to 0.0 if not provided.

  • com_y (Optional[float or ArrayType]) – The y-coordinate of the center of mass. Defaults to 0.0 if not provided.

Returns:

An array of joint angles for the leg, including hip pitch, hip roll, knee, and ankle pitch.

Return type:

ArrayType

get_arm_motor_pos(obs: Obs) ndarray[Any, dtype[float32]]

Retrieve the positions of the arm motors from the observation data.

Parameters:

obs (Obs) – The observation data containing motor positions.

Returns:

An array of the arm motor positions.

Return type:

npt.NDArray[np.float32]

get_command(control_inputs: Dict[str, float]) ndarray[Any, dtype[float32]]

Generates a command array based on control inputs for various tasks.

This function processes a dictionary of control inputs, mapping each task to a corresponding command value within a predefined range. It also manages the state of a button press to toggle logging.

Parameters:

control_inputs (Dict[str, float]) – A dictionary where keys are task names and values are the corresponding input values.

Returns:

An array of command values, each corresponding to a task, with values adjusted according to the input and predefined command ranges.

Return type:

npt.NDArray[np.float32]

get_motor_target(arm_joint_pos: ndarray[Any, dtype[float32]], command: ndarray[Any, dtype[float32]], neck_joint_pos: ndarray[Any, dtype[float32]] = None)
get_neck_motor_pos(obs: Obs) ndarray[Any, dtype[float32]]

Retrieve the positions of the arm motors from the observation data.

Parameters:

obs (Obs) – The observation data containing motor positions.

Returns:

An array of the arm motor positions.

Return type:

npt.NDArray[np.float32]

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Processes an observation to compute the next action and control inputs for a robotic system.

Parameters:
  • obs (Obs) – The current observation containing sensor data and time information.

  • is_real (bool, optional) – Indicates if the operation is in a real environment. Defaults to False.

Returns:

A tuple containing control inputs and the target motor positions for the next step.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.calibrate module

Robot zero point calibration policy using torso pitch PID control.

This module implements a calibration policy that uses a PID controller to maintain the robot’s torso pitch at zero degrees for accurate zero-point calibration of motors.

class toddlerbot.policies.calibrate.CalibratePolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], kp: float = 0.1, kd: float = 0.01, ki: float = 0.2)

Bases: BasePolicy

Policy for calibrating zero point with the robot’s torso pitch.

close(exp_folder_path: str)

Save calibrated motor zero positions to config file.

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a control step to maintain the torso pitch at zero using a PD+I controller.

Parameters:
  • obs (Obs) – The current observation containing state information such as time, Euler angles, and angular velocities.

  • is_real (bool, optional) – Flag indicating whether the step is being executed in a real environment. Defaults to False.

Returns:

A tuple containing an empty dictionary and an array of motor target angles.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.cartwheel module

Cartwheel movement policy for dynamic acrobatic maneuvers.

This module implements a cartwheel policy that extends MJXPolicy to perform dynamic cartwheel movements with phase-based motion generation.

class toddlerbot.policies.cartwheel.CartwheelPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], path: str, joystick: Joystick | None = None, fixed_command: ndarray[Any, dtype[float32]] | None = None)

Bases: MJXPolicy

Cartwheel policy for the toddlerbot robot.

get_phase_signal(time_curr: float, num_frames: int = 215)

Get the phase signal for the current time.

toddlerbot.policies.crawl module

Crawling locomotion policy for low-profile movement.

This module implements a crawling policy that enables the robot to move in a prone position using coordinated limb movements.

class toddlerbot.policies.crawl.CrawlPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], path: str, joystick: Joystick | None = None, fixed_command: ndarray[Any, dtype[float32]] | None = None)

Bases: MJXPolicy

Crawling policy for the toddlerbot robot.

get_phase_signal(time_curr: float, init_idx: int = 0, num_frames: int = 950)

Get the phase signal for the current time.

toddlerbot.policies.dp_policy module

Deep policy for vision-based manipulation tasks using learned models.

This module implements a deep policy that combines vision and action data to perform complex manipulation tasks like picking and hugging using trained models.

class toddlerbot.policies.dp_policy.DPPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], ckpt: str, joystick: Joystick | None = None, cameras: List[Camera] | None = None, zmq_receiver: ZMQNode | None = None, zmq_sender: ZMQNode | None = None, ip: str = '', fixed_command: ndarray[Any, dtype[float32]] | None = None, task: str = '')

Bases: BalancePDPolicy

Policy for executing manipulation tasks using a deep policy model.

close(exp_folder_path)

Save recorded camera frames as a video.

get_arm_motor_pos(obs: Obs) ndarray[Any, dtype[float32]]

Calculates the arm motor positions based on the current observation and task.

This function determines the appropriate motor positions for the robot’s arm by utilizing a sequence of model-generated actions. If the observation deque is full, it retrieves actions from the model and adjusts them based on the task type. For a “pick” task, it combines the model actions with the current manipulator positions. If the robot lacks a gripper, it adjusts the action sequence accordingly. If the observation deque is not full, it defaults to using the current manipulator positions.

Parameters:

obs (Obs) – The current observation of the environment.

Returns:

The calculated arm motor positions.

Return type:

npt.NDArray[np.float32]

reset() None

Resets the state of the object to its initial configuration.

This method clears the observation deque, resets the model action sequence, sets the manipulation count to zero, and clears the wrap-up time. It also invokes the reset method of the superclass to ensure any inherited state is also reset.

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a step in the robot’s control loop, updating motor targets and handling task-specific logic.

Parameters:
  • obs (Obs) – The current observation containing sensor data and motor positions.

  • is_real (bool, optional) – Flag indicating whether the step is being executed in a real environment. Defaults to False.

Returns:

A tuple containing control inputs and the target motor positions.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.handstand module

Handstand policy for performing handstand maneuvers.

This module implements a handstand policy that positions the robot into a handstand pose with arms extended upward.

class toddlerbot.policies.handstand.HandstandPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]])

Bases: BasePolicy

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Generates the next action based on the current observation and preparation phase.

Parameters:
  • obs (Obs) – The current observation containing the time and other relevant data.

  • is_real (bool, optional) – Flag indicating if the step is being executed in a real environment. Defaults to False.

Returns:

A tuple containing an empty dictionary and the next action as a numpy array. If the current time is within the preparation duration, the action is interpolated based on the preparation time and action; otherwise, the default motor position is returned.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.mjx_policy module

MJX-based neural policy for reinforcement learning control.

This module implements a policy that uses trained neural networks via ONNX Runtime to control the robot using observations and commands with frame stacking and action buffering.

class toddlerbot.policies.mjx_policy.MJXPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], path: str, joystick: Joystick | None = None, fixed_command: ndarray[Any, dtype[float32]] | None = None)

Bases: BasePolicy

Policy for controlling the robot using the MJX model.

get_command(obs: Obs, control_inputs: Dict[str, float]) ndarray[Any, dtype[float32]]

Returns a fixed command as a NumPy array.

Parameters:

control_inputs (Dict[str, float]) – A dictionary of control inputs, where keys are input names and values are their respective float values.

Returns:

A fixed command represented as a NumPy array of float32 values.

Return type:

npt.NDArray[np.float32]

get_phase_signal(time_curr: float) ndarray[Any, dtype[float32]]

Get the phase signal at the current time.

Parameters:

time_curr (float) – The current time for which the phase signal is requested.

Returns:

An array containing the phase signal as a float32 value.

Return type:

npt.NDArray[np.float32]

reset()

Resets the internal state of the policy to its initial configuration.

This method clears the observation history, phase signal, command list, and action buffer. It also sets the standing state to True and initializes the last action and current step counter to zero.

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Processes a single step in the control loop, updating the system’s state and generating motor target positions.

Parameters:
  • obs (Obs) – The current observation containing motor positions, velocities, and other sensor data.

  • is_real (bool, optional) – Indicates if the system is operating in a real environment. Defaults to False.

Returns:

A tuple containing the control inputs and the target motor positions.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.mjx_policy.load_wandb_policy(name: str, project='ToddlerBot', entity='toddlerbot') Dict[str, Any]

Load a policy from WandB artifacts.

toddlerbot.policies.pull_up module

Pull-up exercise policy using vision-guided grasping.

This module implements a pull-up policy that uses computer vision to detect targets and perform coordinated arm movements for pull-up exercises.

class toddlerbot.policies.pull_up.PullUpPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]])

Bases: BasePolicy

Policy for pulling up the robot.

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a step in the control loop, determining the appropriate action based on the current observation and state.

Parameters:
  • obs (Obs) – The current observation containing time and other relevant data.

  • is_real (bool, optional) – Flag indicating whether the operation is in a real environment. Defaults to False.

Returns:

A tuple containing an empty dictionary and the computed action as a NumPy array.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.record module

toddlerbot.policies.replay module

Motion replay policy for executing pre-recorded movements.

This module implements a replay policy that can playback keyframe animations or recorded motion data with proper interpolation and synchronization.

class toddlerbot.policies.replay.ReplayPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], path: str)

Bases: BasePolicy

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a single step in the simulation or real environment, returning the current action.

This function determines the appropriate action to take based on the current observation and whether the environment is real or simulated. It handles the preparation phase if necessary and updates the action based on the current time and keyboard inputs.

Parameters:
  • obs (Obs) – The current observation containing the time and other relevant data.

  • is_real (bool, optional) – Indicates if the environment is real. Defaults to False.

Returns:

A tuple containing an empty dictionary and the action array for the current step.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.reset_pd module

Robot reset policy for returning to default configuration.

This module implements a reset policy that safely returns the robot to its default position using controlled movement trajectories.

class toddlerbot.policies.reset_pd.ResetPDPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], joystick: Joystick | None = None, cameras: List[Camera] | None = None, zmq_receiver: ZMQNode | None = None, zmq_sender: ZMQNode | None = None, ip: str = '', fixed_command: ndarray[Any, dtype[float32]] | None = None, use_torso_pd: bool = True)

Bases: BalancePDPolicy

Policy for resetting the robot to the default position.

get_arm_motor_pos(obs: Obs) ndarray[Any, dtype[float32]]

Retrieves the positions of the arm motors from the observation data.

Parameters:

obs (Obs) – An observation object containing motor position data.

Returns:

An array of float32 values representing the positions of the arm motors.

Return type:

npt.NDArray[np.float32]

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a control step for a robotic system, managing motor positions and reset sequences.

This function processes the current observation to determine the appropriate motor targets. If a reset is required due to button press and waist motor positions exceeding a threshold, it calculates the necessary reset trajectory to bring the motors to their default positions. Otherwise, it performs a standard control step.

Parameters:
  • obs (Obs) – The current observation containing motor positions and time.

  • is_real (bool, optional) – Flag indicating if the operation is on a real system. Defaults to False.

Returns:

A tuple containing control inputs and the target motor positions.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.run_policy module

toddlerbot.policies.stand module

Simple standing policy for maintaining default posture.

This module implements a basic standing policy that keeps the robot in its default motor position configuration.

class toddlerbot.policies.stand.StandPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]])

Bases: BasePolicy

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Generates the next action based on the current observation and preparation phase.

Parameters:
  • obs (Obs) – The current observation containing the time and other relevant data.

  • is_real (bool, optional) – Flag indicating if the step is being executed in a real environment. Defaults to False.

Returns:

A tuple containing an empty dictionary and the next action as a numpy array. If the current time is within the preparation duration, the action is interpolated based on the preparation time and action; otherwise, the default motor position is returned.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.sysID module

System identification policy for motor characterization.

This module implements a system identification policy that generates chirp signals with varying amplitudes and PID gains to characterize motor dynamics and performance.

class toddlerbot.policies.sysID.SysIDPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]])

Bases: BasePolicy

System identification policy for the toddlerbot robot.

close(exp_folder_path: str)

Save checkpoint configuration data to file.

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a step in the environment by interpolating an action based on the given observation time.

Parameters:
  • obs (Obs) – The observation containing the current time.

  • is_real (bool, optional) – Flag indicating whether the step is in a real environment. Defaults to False.

Returns:

A tuple containing an empty dictionary and the interpolated action as a NumPy array.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

class toddlerbot.policies.sysID.SysIDSpecs(amplitude_list: List[float], initial_frequency: float = 0.1, final_frequency: float = 10.0, decay_rate: float = 0.1, direction: float = 1, kp_list: List[float] | None = None, warm_up_angles: Dict[str, float] | None = None)

Bases: object

Dataclass for system identification specifications.

amplitude_list: List[float]
decay_rate: float = 0.1
direction: float = 1
final_frequency: float = 10.0
initial_frequency: float = 0.1
kp_list: List[float] | None = None
warm_up_angles: Dict[str, float] | None = None

toddlerbot.policies.talk_arya module

toddlerbot.policies.talk_toddy module

toddlerbot.policies.teleop_follower_pd module

Teleoperation follower policy with data logging capabilities.

This module implements the follower robot policy for teleoperation systems, handling remote commands while logging interaction data for training.

class toddlerbot.policies.teleop_follower_pd.TeleopFollowerPDPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], joystick: Joystick | None = None, cameras: List[Camera] | None = None, zmq_receiver: ZMQNode | None = None, zmq_sender: ZMQNode | None = None, ip: str = '', fixed_command: ndarray[Any, dtype[float32]] | None = None, use_torso_pd: bool = True, task: str = '')

Bases: BalancePDPolicy

Teleoperation follower policy for the follower robot of ToddlerBot.

close(exp_folder_path)

Move logged dataset files to experiment folder.

get_arm_motor_pos(obs: Obs) ndarray[Any, dtype[float32]]

Retrieves the current positions of the arm motors.

If the arm motor positions have been explicitly set, returns those values. Otherwise, extracts and returns the positions from the manipulation motor positions using predefined arm motor indices.

Parameters:

obs (Obs) – The observation object containing relevant data.

Returns:

An array of arm motor positions.

Return type:

npt.NDArray[np.float32]

get_neck_motor_pos(obs: Obs) ndarray[Any, dtype[float32]]

Retrieves the current positions of the neck motors.

If the neck motor positions have been explicitly set, returns those values. Otherwise, extracts and returns the positions from the manipulation motor positions using predefined arm motor indices.

Parameters:

obs (Obs) – The observation object containing relevant data.

Returns:

An array of arm motor positions.

Return type:

npt.NDArray[np.float32]

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a step in the control loop, adjusting motor targets based on the task and logging data.

Parameters:
  • obs (Obs) – The current observation containing time and motor positions.

  • is_real (bool, optional) – Flag indicating if the step is in a real environment. Defaults to False.

Returns:

A tuple containing control inputs and the updated motor target positions.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.teleop_joystick module

Joystick-based teleoperation policy with multi-mode switching.

This module implements a comprehensive joystick control system that dynamically switches between different behavior policies (walk, teleop, reset, etc.) based on user input.

class toddlerbot.policies.teleop_joystick.TeleopJoystickPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], ip: str = '', run_name: str = '')

Bases: BasePolicy

close(exp_folder_path: str)

Save recorded camera frames from manipulation policies as videos.

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Processes the current observation and determines the appropriate control inputs and motor targets based on the active policy.

Parameters:
  • obs (Obs) – The current observation data.

  • is_real (bool, optional) – Flag indicating whether the operation is in a real environment. Defaults to False.

Returns:

A tuple containing the control inputs and the motor target values.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

toddlerbot.policies.teleop_leader module

toddlerbot.policies.walk module

Walking locomotion policy with phase-based gait control.

This module implements a walking policy that generates coordinated bipedal locomotion using cyclic phase signals and command interpretation.

class toddlerbot.policies.walk.WalkPolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], path: str, joystick: Joystick | None = None, fixed_command: ndarray[Any, dtype[float32]] | None = None)

Bases: MJXPolicy

Walking policy for the toddlerbot robot.

get_command(obs: Obs, control_inputs: Dict[str, float]) ndarray[Any, dtype[float32]]

Generates a command array based on control inputs for walking.

Parameters:

control_inputs (Dict[str, float]) – A dictionary containing control inputs with keys ‘walk_x’, ‘walk_y’, and ‘walk_turn’.

Returns:

A numpy array representing the command, with the first five elements as zeros and the remaining elements scaled by the command discount factor.

Return type:

npt.NDArray[np.float32]

get_phase_signal(time_curr: float)

Calculate the phase signal as a 2D vector for a given time.

Parameters:

time_curr (float) – The current time for which to calculate the phase signal.

Returns:

A 2D vector containing the sine and cosine components of the phase signal, with dtype np.float32.

Return type:

np.ndarray

step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Executes a control step based on the observed state and updates the standing status.

Parameters:
  • obs (Obs) – The current observation of the system state.

  • is_real (bool, optional) – Flag indicating whether the step is being executed in a real environment. Defaults to False.

Returns:

A tuple containing the control inputs as a dictionary and the motor target as a NumPy array.

Return type:

Tuple[Dict[str, float], npt.NDArray[np.float32]]

Module contents

Policy framework for ToddlerBot robot control and behavior.

This module provides the base infrastructure for implementing robot control policies, including support for various behavioral patterns such as walking, standing, manipulation, and teleoperation modes.

The policy framework supports: - Unified interface for different control strategies - Smooth transitions between motor positions - Configurable preparation phases for policy initialization - Integration with both simulation and real robot hardware

All policies inherit from BasePolicy and implement domain-specific control logic while maintaining consistent interfaces.

class toddlerbot.policies.BasePolicy(name: str, robot: Robot, init_motor_pos: ndarray[Any, dtype[float32]], control_dt: float = 0.02, prep_duration: float = 2.0, n_steps_total: float = inf)

Bases: ABC

Abstract base class for all robot control policies.

This class defines the common interface and shared functionality for robot control policies. It handles initialization, motor group indexing, preparation trajectories, and provides a consistent framework for implementing different behavioral patterns.

All concrete policy implementations must inherit from this class and implement the abstract step() method to define their specific control logic.

close(exp_folder_path: str = '')

Clean up resources and save any final data.

Parameters:

exp_folder_path – Optional path to save experiment data or logs. Defaults to empty string.

reset()

Reset the policy state for a new episode or run.

This method is called to reinitialize the policy state when starting a new control episode. Subclasses can override this method to implement specific reset behavior.

abstractmethod step(obs: Obs, sim: BaseSim) Tuple[Dict[str, float], ndarray[Any, dtype[float32]]]

Execute one control step of the policy.

Parameters:
  • obs – Current observation containing robot state information.

  • sim – Simulation or robot interface for sending control commands.

Returns:

  • Dictionary mapping motor names to target positions

  • Array of motor positions for all motors

Return type:

A tuple containing