NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan · Feb 24, 2026 · Citations: 0

Abstract

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.

Human Data Lens

Uses human feedback: No
Feedback types: None
Rater population: Unknown
Unit of annotation: Trajectory
Expertise required: General

Evaluation Lens

Evaluation modes: Automatic Metrics
Agentic eval: None
Quality controls: Not reported
Confidence: 0.30
Flags: low_signal, possible_false_positive

Research Summary

Contribution Summary

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures.
However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations.
In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving).

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Abstract

Human Data Lens

Evaluation Lens

Research Summary

Contribution Summary

Related Papers