Video Moment Annotation — Food Cooking Clips

Annotate precise temporal segments in food and cooking videos: mark start/end times, classify actions and objects, and write visually grounded descriptions for CLIP. Entry-level, remote contract work paid per label ($0.05) using Label Studio.

Image & Video Annotation

100% Remote Per task · $0.05/label

$0.05/label

Compensation

Worldwide

Eligibility

Entry

Experience

Nov 19, 2025

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect contributors with projects that teach AI systems how to see, hear, and understand the world.

This role sits inside a fast-growing industry where human reviewers prepare the examples modern AI models learn from. OpenTrain makes it simple to start and grow a remote, flexible career doing this work.

About AI Training Work

AI training (data labeling/annotation) is the human side of building machine intelligence. Reviewers annotate images, video, and audio or write and rate model outputs so models learn to recognize actions, objects, and context.

Most projects are remote and flexible: you can work part time, choose hours that fit your schedule, and directly influence how state-of-the-art AI behaves.

The Role

You will identify and mark temporal segments in food and cooking videos that match natural-language queries, add action and object classifications, and write visually grounded descriptions used for CLIP-style training.

This is contract, remote work performed in Label Studio. Work is paid per label at $0.05 and is open worldwide.

Data type: Video
Label types: Action recognition and classification
Tool: Label Studio
Employment: Contractor, remote, worldwide
Payment: Pay-per-label ($0.05/label)

What You'll Do

Follow the query for each task, watch the video, and mark every matching temporal segment with precise start and end times. Classify the action type and any visible objects, then write a short visual proxy description suitable for CLIP training.

Read the query to understand the target action/event
Watch the full video and identify all segments that match the query
Mark precise start and end times on the timeline
Select action type and objects present from the provided taxonomy
Write visually grounded, specific descriptions (what you see, not assumptions)

Domain Actions & Examples

You will encounter common food-video actions such as chopping, mise en place, mixing, kneading, sautéing, stirring, deglazing, tasting, seasoning, boiling, grilling, baking, plating, garnishing, sauce drizzle, introductions, finished-dish reveals, and eating reactions.

Descriptions should be action-focused and CLIP-friendly — for example: "person in white chef coat slicing red tomatoes on wooden board" or "hand stirring wooden spoon in stainless steel pot".

Quality Guidelines & Confidence

High-quality annotations are precise, complete, and visually grounded. Pay careful attention to the timeline and cover every instance that matches the query.

Use the provided confidence scale when boundaries are uncertain so training teams can filter or review lower-confidence labels.

Boundary accuracy: mark within 0.5 seconds when possible
Query specificity: ensure the segment clearly matches the query
Visual descriptions must report what you see, not background knowledge
Complete coverage: mark all matching segments, not just the first
Confidence levels: High (within 0.5s), Medium (within 1–2s), Low (ambiguous)

Common Mistakes To Avoid

Avoid marking segments that are too short or too long, using vague descriptions, or adding non-visual assumptions. Keep descriptions concrete and action-oriented.

Do not use abstract adjectives like "delicious" — describe visual cues instead
Avoid including unrelated footage before or after the action
Do not miss additional occurrences of the same action later in the video
Do not invent objects or intents that are not visible on screen

Requirements

Preserve every listed requirement: this project requires experience with data labeling and a graduate degree, plus a strong understanding of food and cooking videos. The client accepts entry-level contributors, but the graduate-degree requirement is mandatory.

You must be comfortable using Label Studio, following written annotation guides, and working as an independent contractor. There are no mandated hours—work is flexible and on-demand.

Education: Graduate degree (required)
Experience: Data labeling experience (required)
Domain knowledge: Familiarity with food/cooking video content (required)
Tools: Experience or willingness to use Label Studio
Work arrangement: Contractor, remote, flexible schedule

How It Works & How to Apply

Create an OpenTrain account to apply, build your profile, and start. You will receive tasks through Label Studio, complete annotations following this guide, and get paid per label at the stated rate.

Because this is paid per label, throughput varies by task length and complexity. Follow quality standards to maximize accepted labels and earning potential.

Sign up on OpenTrain and complete any qualification tests required by the project
Work remotely and submit annotations in Label Studio
Payment is per accepted label at $0.05; contractor agreement applies