Robotic Offline RL from Internet Videos via Value-Function Pre-Training

Chethan Bhateja*, Derek Guo*, Dibya Ghosh*, Anikait Singh, Manan Tomar, Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar
*equal contribution

Results

Scenario 1 (Variability in Object and Gripper Position)

In Scenario 1, we evaluate the methods randomizing the initial positions of objects and the gripper. We showcase successful trajectories collected by our method and compare them to unsuccessful ones collected by other methods. For success rates, see the quantitative results section below. Left: V-PTR (ours). Right: (clockwise, from top left) PTR (Kumar et al. 2023), VIP (Ma et al. 2022), R3M (Nair et al. 2022), masked visual pre-training (Xiao et al. 2022)

Croissant from Bowl

V-PTR (ours)
PTR
VIP
MAE
R3M

Sweet Potato on Plate

V-PTR (ours)
PTR
VIP
MAE
R3M

Knife in Pot

V-PTR (ours)
PTR
VIP
MAE
R3M

Cucumber in pot

V-PTR (ours)
PTR
VIP
MAE
R3M

Open Microwave

V-PTR (ours)
PTR
VIP

Sweep Beans

V-PTR (ours)
PTR
VIP

Scenario 2 (Novel Distractor Objects)

We introduce various distractors to test whether the robot is able to identify the correct object to pick up. The order of videos is the same as before.

Croissant from Bowl

V-PTR (ours)
PTR
VIP
MAE
R3M

Sweet Potato on Plate

V-PTR (ours)
PTR
VIP
MAE
R3M

Knife in Pot

V-PTR (ours)
PTR
VIP
MAE
R3M

Cucumber in pot

V-PTR (ours)
PTR
VIP
MAE
R3M

Scenario 3 (Novel Target Objects to Manipulate)

We introduce various distractors to test whether the robot is able to identify the correct object to pick up. The order of videos is the same as before.

Various Objects in Colander

V-PTR (ours)
PTR
VIP
MAE
R3M
This project site borrows heavily from here