plunder-project

Plunder: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Jimmy Xin, Linus Zheng, Kia Rahmani, Jiayi Wei, Jarrett Holtz, Isil Dillig, Joydeep Biswas

Abstract

Imitation Learning (IL) is a popular approach for teaching robots how to perform novel tasks using only human demonstrations. Plunder is a novel Programmatic Imitation Learning (PIL) algorithm that integrates a probabilistic program synthesizer in an iterative Expectation-Maximization (EM) framework. Our method is robust against actuation errors in demonstrations, and the synthesized probabilistic policies intuitively reflect uncertainties commonly found in human demonstrations. We benchmark Plunder against several established IL techniques, and demonstrate its superiority across five challenging imitation learning tasks under noise. Plunder policies achieve 95% accuracy in matching the given demonstrations, outperforming the next best baseline by 19%. Additionally, policies generated by Plunder successfully complete the tasks 17% more frequently than the nearest baseline.

Experiments

Stop Sign (SS): acceleration control in a straight line.
Pass Traffic (PT): switching lanes in highway traffic. See here for the version with human demonstrations.
Merge (MG): merging from the leftmost to the rightmost lane in highway traffic. See here for the version with human demonstrations.
Pick and Place (PP): controlling a robot arm to grab and object and move it to a designated location.
Stack (ST): stacking two boxes on top of each other at a designated location.

Experiment Details

Task	Environment	Action Labels	Observed Variables	State Variables	Training Set Size	Testing Set Size
SS	AV	ACC, DEC, CON	acc	a_min, a_max, v, v_max, target, dist_to_target	10	20
PT	AV	FASTER, SLOWER, LANE_LEFT, LANE_RIGHT	acc, steer	x, v, front_x, front_v, right_x, right_v, left_x, left_v	10	20
MG	AV	FASTER, SLOWER, LANE_LEFT, LANE_RIGHT	acc, steer	x, v, front_x, front_v, right_x, right_v, left_x, left_v	10	20
PP	RA	MOVE_TO_CUBE, MOVE_TO_TARGET	dx, dy, dz, d_end	x, y, z, block_x, block_y, block_z, target_x, target_y, target_z	5	15
ST	RA	MOVE_TO_CUBE1, MOVE_TO_CUBE2, MOVE_TO_TARGET, GRASP, LIFT	dx, dy, dz, d_end	x, y, z, block_x, block_y, block_z, block2_x, block2_y, block2_z, target_x, target_y, target_z	10	10

Aggregate Results

AST sizes

For full policies, please visit each module’s page.

Task	SS	PT	MG	PP	ST	PT_H	MG_H
AST size (total)	25	96	46	22	55	196	98
Average size per transition	8.33	9.6	7.67	22	6.11	16.33	12.25

Accuracy Results

Visualizes the accuracy of action labels each learned policy produces on the test trajectories.

Likelihood Heatmap

Visualizes the log-likelihood of observations generated by each policy on the test trajectories.

Noise Experiments

Visualizes the degradation of performance of each approach, measured both by accuracy and log-likelihood of observations.

EM Loop Convergence

Visualize convergence for each task. More graphs can be found on each module’s page.

Success Rates

Each approach’s policy was tested by running it in the environment 100 times, and the number of successful runs was recorded. The results are visualized below, along with our metrics for what is considered a successful run.

Task	Success Metric
SS	Coming to a stop near the target
PT	Maintaining some minimum velocity and being in a valid lane for a specified period of time
MG	Reaching the rightmost lane, starting from the leftmost lane
PP	Placing a randomly-initialized cube at a randomly-initialized target position
ST	Placing two randomly-initialized cubes in a stack at a randomly-initialized target position, in the correct order

Particle Filter Implementation

Our particle filter follows from a standard implementation and can be found here: https://github.com/ut-amrl/plunder/blob/main/particleFilter/pf.h

The algorithm is as follows:

Initialize all particles to start with the default label.
for i=1..n
     Forward-propagate particles by acquiring new labels for each particle using the probabilistic policy π.
     Reweight particles using the observation model.
     Normalize weights.
     Resample particles to remove ineffective particles.
Trace back particle lineages to acquire full trajectories.

Citation

@ARTICLE{10493855,
author={Xin, Jimmy and Zheng, Linus and Rahmani, Kia and Wei, Jiayi and Holtz, Jarrett and Dillig, Isil and Biswas, Joydeep},
journal={IEEE Robotics and Automation Letters},
title={Programmatic Imitation Learning From Unlabeled and Noisy Demonstrations},
year={2024},
volume={},
number={},
pages={1-8},
keywords={Task analysis;Trajectory;Probabilistic logic;Noise measurement;Training data;Noise;Approximation algorithms},
doi={10.1109/LRA.2024.3385691}}