Diversity-driven agentic framework integrated with self-reflection and memory

RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework

An automated framework for generating diverse, physically plausible manipulation tasks to enhance Vision-Language-Action model pre-training

RoboGene Framework Overview

Abstract

The pursuit of general-purpose robotic manipulation is hindered by the scarcity of diverse, real-world interaction data. Unlike data collection from the web in vision or language, robotic data collection is an active process incurring prohibitive physical costs.

Existing manual methods are unscalable and biased toward common tasks. To address this, we introduce RoboGene, an agentic framework designed to automate the generation of diverse, physically plausible manipulation tasks across single-arm, dual-arm, and mobile robots.

RoboGene integrates three core components: diversity-driven sampling, self-reflection mechanisms, and human-in-the-loop refinement. We conduct datasets of 18k trajectories and introduce novel metrics to assess task quality and diversity.

Methodology

RoboGene Method Overview
STAGE 01

Diversity-Driven Sampling

Implements LFU strategy to explore task space comprehensively, ensuring coverage of edge cases often overlooked by manual curation.

STAGE 02

Self-Reflection Validation

Enforces physical constraints through simulation, filtering out infeasible trajectories and ensuring tasks are executable.

STAGE 03

Human-in-the-Loop

Continuous refinement through human feedback, enabling the system to adapt to specific robot embodiments.

Experimental Setup

Hardware Setup

Evaluated across multiple robot platforms including single-arm, dual-arm manipulators, and mobile manipulators.

18k+Trajectories Collected
3Robot Platforms
300+Manipulation Tasks
2xBetter than Baselines

Diverse Real-World Tasks

Automated task generation across multiple embodiments including mobile manipulators, single-arm, and dual-arm systems.

Water Plant

Organize Desk

Assembling Crucible

Replenish Shelves

Sort Buttons

Pour Water

Handover Tape

Open Medical Box

Rinse Lettuce

Downstream Fine-tuning Performance

Quantitative assessment of pre-training efficacy and real-world execution demonstrations.

Pre-training Efficacy

Results Chart

Grill Skewers

Weigh beakers

Build Blocks

Lubricate Gears

Zero-Shot Generalization & Robustness Analysis

Radar Chart
Figure: Generalization Evaluation. Compare RoboGene (Red) against baselines across six categories.

Novel Objects

Illumination Changes

Static Distractors

Background Variations

Instruction Changes

In-Distribution