Modeling and manipulating elasto-plastic objects are essential capabilities for robots to perform complex industrial and household interaction tasks (e.g., stuffing dumplings, rolling sushi, and making pottery). However, due to the high degrees of freedom of elasto-plastic objects, significant challenges exist in virtually every aspect of the robotic manipulation pipeline, for example, representing the states, modeling the dynamics, and synthesizing the control signals. We propose to tackle these challenges by employing a particle-based representation for elasto-plastic objects in a model-based planning framework. Our system, RoboCraft, only assumes access to raw RGBD visual observations. It transforms the sensory data into particles and learns a particle-based dynamics model using graph neural networks (GNNs) to capture the structure of the underlying system. The learned model can then be coupled with model predictive control (MPC) algorithms to plan the robot's behavior. We show through experiments that with just 10 min of real-world robot interaction data, our robot can learn a dynamics model that can be used to synthesize control signals to deform elasto-plastic objects into various complex target shapes, including shapes that the robot has never encountered before. We perform systematic evaluations in both simulation and the real world to demonstrate the robot's manipulation capabilities.
We use four calibrated Intel RealSense D415 cameras to capture a complete point cloud of the robot’s workspace. From the raw point cloud, our algorithm (a) crops to the region where the plasticine and the tool reside, (b) extracts the point cloud of the plasticine by color segmentation, (c) reconstructs a watertight mesh around the point cloud with Poisson surface reconstruction method, (d) uses SDF (Signed Distance Function) of the watertight mesh to sample points inside the mesh, (e) removes points lying within the SDF of the tools, (f) does alpha-shape surface reconstruction and uniformly samples 300 points on the reconstructed surface with Poisson disk sampling method.
Our GNN-based dynamics model accurately predicts the change of the plasticine's state in a long-horizon gripping task. We show three examples from the top, front, and perspective views. We compare the prediction of our dynamics model with the ground truth acquired from the perception module to show our model’s accuracy. The blue dots represent the plasticine particles, and the red dots represent the tool particles.
We use a combination of sampling- and gradient-based trajectory optimization techniques to solve the planning problem. We first do grid sampling in the simplified action space and then forward the trained GNN-based dynamics model with the initial state of the plasticine and the sampled actions as input. After we obtain the final state of the plasticine after applying the actions, we compute the Chamfer distance between them and the target state. Next, we apply gradient-based trajectory optimization on the lowest-cost trajectory to improve the solution further. The red dots and arrows represent the motion of the parallel two-finger gripper.
(a) An overview of the robot workspace. The dashed black circles show the four RGBD Intel RealSense D415 cameras mounted at the four corners of the robot table. The red cubic contour denotes the robot's manipulation area. (b) We illustrate the xyz coordinate system in the robot frame and the simplified action space of the gripping task. (c) The 3D-printed parallel two-finger gripper that the robot uses to pinch the plasticine. (d) Since some end effector poses are close to the robot’s kinematic limits, we designed a rotating object stand to place the plasticine so that the robot can rotate the stand instead of rotating its hand on the z-axis. (e) The robot can rotate the object stand through three steps: (1) insert the gripper into the cavities on two sides of the gripper; (2) rotate the stand along with the plasticine; (3) open two fingers to release the stand. Then, the robot can start gripping.
On the left are the manipulation steps of the alphabet letters 'R,' 'T,' 'X,' 'A,' an hourglass, a 3D 'X,' a stamp, and a pagoda. We use black arrows to illustrate the rotation of the object stand and the motion of the obot gripper. The numbers at the top left corner of the images denote the i-th grip, and images that belong to the same grip are placed inside the same black contour. The sixth column shows the final result. On the right are the corresponding target point clouds acquired from expert human demonstrations using the same robot gripper to pinch the target. The last column shows the targets' visualizations in the real world. Note that these images do not supervise our proposed method but merely illustrate the targets we attempt to achieve.
@article{shi2024robocraft,
title={RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks},
author={Shi, Haochen and Xu, Huazhe and Huang, Zhiao and Li, Yunzhu and Wu, Jiajun},
journal={The International Journal of Robotics Research},
volume={43},
number={4},
pages={533--549},
year={2024},
publisher={SAGE Publications Sage UK: London, England}
}