VLA Video Annotation Specialist (Level B)

Annotate humanoid video data to train vision-language-action (VLA) models using Encord; Level B, intermediate role, 20+ hrs/week, contractor/part-time, pay around $6/hr. Worldwide remote — Encord experience preferred, otherwise state other platforms you've used.

Image Video Annotation

100% Remote Hourly · $5–$8/hr

$5–$8/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Jun 27, 2026

Posted

Open worldwide

About OpenTrain

OpenTrain is the #1 platform for building careers in AI training and data labeling. We connect people with hands-on work that shapes how modern AI systems behave and grow skilled contributors through real projects and clear workflows.

About AI training and this project

AI training (data labeling) is the human work behind state-of-the-art models — preparing, annotating, and reviewing examples that models learn from. This project focuses on video annotation for humanoid robotics: you will label video frames to teach vision-language-action models how humans and robots interact in motion.

The role

We are hiring an intermediate-level (Level B) Video Annotation Specialist to label video data used to train VLA models for humanoid robotics. This is a remote, contractor, part-time role requiring 20+ hours per week. You'll work in Encord; candidates with prior Encord experience are preferred.

Job type: Contractor, Part-time
Time: 20+ hours/week
Level: Intermediate (Level B)
Platform: Encord (preferred)

What you'll do

You will annotate video footage of humanoid movement and interactions following detailed project guidelines. Accuracy and consistency are important — your labels directly affect model behavior.

Create bounding boxes and object detection labels across video frames
Apply segmentation masks for objects and persons
Label classification attributes and scene-level tags
Place point keypoints for joints and pose landmarks
Work in Encord and follow frame-level annotation workflows

Requirements

Candidates must have prior experience annotating video for VLA or similar vision-language models and meet Level B experience expectations. If you have used Encord, please state it; if not, specify which annotation platforms you have worked on.

Prior experience with video annotation for VLA or related models (required)
Level B / intermediate annotation experience (required)
Experience using Encord is preferred; otherwise list platforms you've used
Ability to commit 20+ hours per week and work remotely

Compensation & schedule

This project pays on an hourly basis. The listed hourly rate is USD $6/hr (hourly range $5–$8/hr). Work is flexible but requires a consistent 20+ hour weekly commitment.

Payment type: Pay per hour (USD)
Listed rate: $6/hr (range $5–$8/hr)
Contractor / part-time engagement, remote, worldwide

How to apply

Apply through your OpenTrain profile and include a brief summary of your relevant annotation experience. State whether you have used Encord; if not, list the platforms you have worked with and examples of past VLA/video projects. We'll share more project videos and guidelines when we connect.

Include: Level B experience details and platforms you've used (Encord preferred)
Mention availability for 20+ hours/week and timezone if relevant
Provide examples or brief descriptions of past video/VLA annotation work