18 hours ago

Table of Contents

Abstract

Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical robot execution, which is costly and difficult to scale. To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones. Its core innovation is a Remote Inference framework that visualizes the policy's predicted trajectory via Augmented Reality (AR) Visual Foresight. This immersive feedback allows collectors to proactively identify potential failures and focus data collection on the policy's weak regions without requiring a physical robot. Furthermore, we implement an asynchronous Online Finetuning pipeline that continuously updates the policy with incoming data, effectively closing the learning loop in minutes. Extensive experiments demonstrate that RoboPocket adheres to data scaling laws and doubles the data efficiency compared to offline scaling strategies, overcoming their long-standing efficiency bottleneck. Moreover, our instant iteration loop also boosts sample efficiency by up to 2times in distributed environments a small number of interactive corrections per person. Project page and videos: https://robo-pocket.github.io.

One-sentence Summary

Researchers from Shanghai Jiao Tong University and Noematrix Ltd. introduce RoboPocket, a smartphone-based system that uses AR Visual Foresight to enable robot-free instant policy iteration, allowing users to proactively identify failures and refine policies in minutes while doubling data efficiency compared to traditional offline methods.

Key Contributions

RoboPocket addresses the scalability bottleneck in robot learning by transforming passive handheld data collection into an active, computationally guided workflow that provides real-time on-device feedback for higher quality demonstrations.
The system introduces a novel Robot-Free Instant Policy Iteration paradigm that uses AR Visual Foresight to visualize predicted trajectories, allowing users to proactively identify and correct policy weaknesses without physical robot deployment.
Experiments across diverse manipulation tasks demonstrate that this approach adheres to data scaling laws and achieves up to a 2× improvement in data efficiency compared to offline strategies while enabling rapid distributed learning.

Introduction

Scaling imitation learning in robotics is hindered by the high cost and logistical difficulty of collecting diverse, high-quality data from physical robots. Prior handheld interfaces allow for robot-free data collection but operate in an open-loop manner, forcing users to record demonstrations blindly without knowing where the current policy fails. Conversely, interactive methods that correct these failures require physical robot deployment, which is slow, risky, and impossible to scale across distributed environments. The authors introduce RoboPocket, a system that transforms a consumer smartphone into an intelligent co-pilot for robot learning by using Augmented Reality Visual Foresight to project the policy's predicted trajectory directly onto the user's screen. This approach enables users to proactively identify and correct policy weaknesses in minutes without a physical robot, while an asynchronous online finetuning pipeline instantly updates the model with new data to close the learning loop.

Dataset

Dataset Composition and Sources: The authors construct a dataset for the "Mouse Arrangement" task to validate data scaling laws, drawing from 32 distinct environments and 47 unique object pairs. The environments span both indoor and outdoor settings to ensure diverse lighting conditions and textures, while object pairs are formed by combining various mice and mouse pads.
Key Details for Each Subset:
- Environment Selection: Two object pairs are randomly selected for data collection within each of the 32 environments.
- Demonstration Volume: The team collects 25 demonstrations for every single environment-object pair combination.
- Evaluation Setup: Testing occurs across 3 different scenes, utilizing 2 initial robot poses and 3 initial object poses to assess generalization.
Model Usage and Training Strategy: Following the protocol from Data Scaling Laws, the authors use this dataset to verify that their RoboPocket system generates high-quality data adhering to power-law scaling relationships. The study emphasizes that increasing diversity in environments and objects is more critical for zero-shot generalization than simply increasing the number of demonstrations per scene.
Processing and Hardware Configuration:
- Physical Setup: Data collection utilizes a Flexiv Rizon 4 robot arm with a Robotiq 2F-85 adaptive gripper fitted with TPU soft fingers to match the handheld collector.
- Data Streaming: An iPhone mounted on the gripper streams camera feeds in real-time to a workstation acting as both the Data Serving Node and Training Server.
- Infrastructure: The system runs on a workstation equipped with an Intel Core i9-12900K CPU and NVIDIA GeForce RTX 3090 GPU, powered by an EcoFlow DELTA 3 MAX portable station.
- Inference: A separate workstation with an Intel Core i9-13900K CPU and NVIDIA GeForce RTX 4090 GPU serves as the Inference Server during Robot-free Instant Policy Iteration.

Method

The authors propose RoboPocket, a system designed to transition from passive data recording to computationally guided learning. Refer to the framework diagram which contrasts the traditional offline iteration loop, characterized by prolonged feedback and limited scenarios, with the proposed instant policy update process that operates without a physical robot. This new workflow enables distributed environments and instant policy updates through a three-step cycle of policy updating, following the policy's intent, and collecting corrections.

The system relies on a specialized hardware-software co-design to ensure physical consistency and real-time interaction. Refer to the hardware and software interface diagram which details the isomorphic gripper, fisheye lens, and the AR-based interaction design. The hardware architecture utilizes an iPhone Pro as an Edge-Compute Hub to run real-time VIO and kinematic solving. It features an isomorphic adaptive gripper that replicates the underactuated dynamics of the target robot to minimize the embodiment gap. Additionally, a custom fisheye lens expands the visual context, while a magnetic encoder captures gripper width with high fidelity. On the software side, the interface provides active data verification through SLAM monitoring and an on-device IK solver, alongside an AR trajectory replay feature that allows users to visualize the end-effector path in real-time.

The core research question driving the system design is how to efficiently collect specific data distributions that the robot actually needs. The authors formulate the robotic manipulation task as a Markov Decision Process (MDP) defined by the tuple $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma)$ . Standard Imitation Learning utilizes a static dataset to train a policy $\pi_{\theta}(\mathbf{a}_t|\mathbf{s}_t)$ that minimizes the divergence from the expert distribution. However, due to compounding errors, the policy inevitably encounters out-of-distribution (OOD) states. Formally, the objective is to minimize the loss under the induced distribution:

$J ( \pi ) = \mathbb { E } _ { \mathbf { s } \sim d _ { \pi } } [ \ell ( \pi ( \mathbf { s } ) , \pi ^ { * } ( \mathbf { s } ) ) ]$

To facilitate continuous learning, the backend employs a distributed server architecture. Refer to the system architecture diagram which illustrates the flow from human operators identifying weaknesses to the training server performing online finetuning. The process begins with human operators identifying anticipated failures or OOD states in the real world. Collected corrective data is immediately streamed to the Data Serving Node. The Training Server then performs online finetuning using a weighted sampling strategy, constructing batches with 50% from the original offline dataset and 50% from the new online dataset to prevent catastrophic forgetting. Finally, updated model weights are synchronized to the Inference Server, achieving a round-trip latency of under 150ms. This architecture creates a tight feedback loop where the user sees a failure, collects corrective data, and the AR visualization reflects the updated policy's improved behavior within minutes.

Experiment

System capability verification confirms that RoboPocket achieves high-fidelity trajectory tracking with superior stability compared to standard SLAM systems, while significantly reducing data collection time through online processing and ensuring physically plausible motion data.
Validation of data scaling laws demonstrates that policy performance on diverse object arrangements follows a power law, proving the system's suitability for large-scale robot learning.
Experiments on four challenging manipulation tasks show that Robot-Free Instant Policy Iteration breaks the performance plateau of standard imitation learning by enabling targeted collection of failure recovery data, achieving results comparable to expert manual intervention without physical robot access.
Distributed deployment across multiple environments reveals that the system facilitates rapid policy adaptation and robust generalization, allowing users to substantially improve success rates in new scenes with minimal interactive corrections.
User studies indicate that non-expert participants effectively utilize real-time feedback and virtual foresight to identify model weaknesses, collecting correction data with state coverage comparable to that of experienced experimenters.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

18 hours ago

Reinforcement Learning

Supervised Fine-Tuning

Robotics

Research Field

Method/Architecture

Junjie Fang Wendi Chen Han Xue Fangyuan Zhou Tian Le Yi Wang Yuting Zhang Jun Lv Chuan Wen Cewu Lu

Table of Contents

Abstract

One-sentence Summary

Key Contributions

RoboPocket addresses the scalability bottleneck in robot learning by transforming passive handheld data collection into an active, computationally guided workflow that provides real-time on-device feedback for higher quality demonstrations.
The system introduces a novel Robot-Free Instant Policy Iteration paradigm that uses AR Visual Foresight to visualize predicted trajectories, allowing users to proactively identify and correct policy weaknesses without physical robot deployment.
Experiments across diverse manipulation tasks demonstrate that this approach adheres to data scaling laws and achieves up to a 2× improvement in data efficiency compared to offline strategies while enabling rapid distributed learning.

Introduction

Dataset

Dataset Composition and Sources: The authors construct a dataset for the "Mouse Arrangement" task to validate data scaling laws, drawing from 32 distinct environments and 47 unique object pairs. The environments span both indoor and outdoor settings to ensure diverse lighting conditions and textures, while object pairs are formed by combining various mice and mouse pads.
Key Details for Each Subset:
- Environment Selection: Two object pairs are randomly selected for data collection within each of the 32 environments.
- Demonstration Volume: The team collects 25 demonstrations for every single environment-object pair combination.
- Evaluation Setup: Testing occurs across 3 different scenes, utilizing 2 initial robot poses and 3 initial object poses to assess generalization.
Model Usage and Training Strategy: Following the protocol from Data Scaling Laws, the authors use this dataset to verify that their RoboPocket system generates high-quality data adhering to power-law scaling relationships. The study emphasizes that increasing diversity in environments and objects is more critical for zero-shot generalization than simply increasing the number of demonstrations per scene.
Processing and Hardware Configuration:
- Physical Setup: Data collection utilizes a Flexiv Rizon 4 robot arm with a Robotiq 2F-85 adaptive gripper fitted with TPU soft fingers to match the handheld collector.
- Data Streaming: An iPhone mounted on the gripper streams camera feeds in real-time to a workstation acting as both the Data Serving Node and Training Server.
- Infrastructure: The system runs on a workstation equipped with an Intel Core i9-12900K CPU and NVIDIA GeForce RTX 3090 GPU, powered by an EcoFlow DELTA 3 MAX portable station.
- Inference: A separate workstation with an Intel Core i9-13900K CPU and NVIDIA GeForce RTX 4090 GPU serves as the Inference Server during Robot-free Instant Policy Iteration.

Method

$J ( \pi ) = \mathbb { E } _ { \mathbf { s } \sim d _ { \pi } } [ \ell ( \pi ( \mathbf { s } ) , \pi ^ { * } ( \mathbf { s } ) ) ]$

Experiment

System capability verification confirms that RoboPocket achieves high-fidelity trajectory tracking with superior stability compared to standard SLAM systems, while significantly reducing data collection time through online processing and ensuring physically plausible motion data.
Validation of data scaling laws demonstrates that policy performance on diverse object arrangements follows a power law, proving the system's suitability for large-scale robot learning.
Experiments on four challenging manipulation tasks show that Robot-Free Instant Policy Iteration breaks the performance plateau of standard imitation learning by enabling targeted collection of failure recovery data, achieving results comparable to expert manual intervention without physical robot access.
Distributed deployment across multiple environments reveals that the system facilitates rapid policy adaptation and robust generalization, allowing users to substantially improve success rates in new scenes with minimal interactive corrections.
User studies indicate that non-expert participants effectively utilize real-time feedback and virtual foresight to identify model weaknesses, collecting correction data with state coverage comparable to that of experienced experimenters.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

RoboPocket: Improve Robot Policies Instantly with Your Phone

Junjie Fang Wendi Chen Han Xue Fangyuan Zhou Tian Le Yi Wang Yuting Zhang Jun Lv Chuan Wen Cewu Lu

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

RoboPocket: Improve Robot Policies Instantly with Your Phone

Junjie Fang Wendi Chen Han Xue Fangyuan Zhou Tian Le Yi Wang Yuting Zhang Jun Lv Chuan Wen Cewu Lu

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

RoboPocket: Improve Robot Policies Instantly with Your Phone

Junjie Fang Wendi Chen Han Xue Fangyuan Zhou Tian Le Yi Wang Yuting Zhang Jun Lv Chuan Wen Cewu Lu

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters