Video Moment Annotation — Food Cooking Clips
Annotate precise temporal segments in food and cooking videos: mark start/end times, classify actions and objects, and write visually grounded descriptions for CLIP. Entry-level, remote contract work paid per label ($0.05) using Label Studio.
Image & Video Annotation
$0.05/label
Compensation
Worldwide
Eligibility
Entry
Experience
Nov 19, 2025
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect contributors with projects that teach AI systems how to see, hear, and understand the world.
This role sits inside a fast-growing industry where human reviewers prepare the examples modern AI models learn from. OpenTrain makes it simple to start and grow a remote, flexible career doing this work.
About AI Training Work
AI training (data labeling/annotation) is the human side of building machine intelligence. Reviewers annotate images, video, and audio or write and rate model outputs so models learn to recognize actions, objects, and context.
Most projects are remote and flexible: you can work part time, choose hours that fit your schedule, and directly influence how state-of-the-art AI behaves.
The Role
You will identify and mark temporal segments in food and cooking videos that match natural-language queries, add action and object classifications, and write visually grounded descriptions used for CLIP-style training.
This is contract, remote work performed in Label Studio. Work is paid per label at $0.05 and is open worldwide.
- Data type: Video
- Label types: Action recognition and classification
- Tool: Label Studio
- Employment: Contractor, remote, worldwide
- Payment: Pay-per-label ($0.05/label)
What You'll Do
Follow the query for each task, watch the video, and mark every matching temporal segment with precise start and end times. Classify the action type and any visible objects, then write a short visual proxy description suitable for CLIP training.
- Read the query to understand the target action/event
- Watch the full video and identify all segments that match the query
- Mark precise start and end times on the timeline
- Select action type and objects present from the provided taxonomy
- Write visually grounded, specific descriptions (what you see, not assumptions)
Domain Actions & Examples
You will encounter common food-video actions such as chopping, mise en place, mixing, kneading, sautéing, stirring, deglazing, tasting, seasoning, boiling, grilling, baking, plating, garnishing, sauce drizzle, introductions, finished-dish reveals, and eating reactions.
Descriptions should be action-focused and CLIP-friendly — for example: "person in white chef coat slicing red tomatoes on wooden board" or "hand stirring wooden spoon in stainless steel pot".
Quality Guidelines & Confidence
High-quality annotations are precise, complete, and visually grounded. Pay careful attention to the timeline and cover every instance that matches the query.
Use the provided confidence scale when boundaries are uncertain so training teams can filter or review lower-confidence labels.
- Boundary accuracy: mark within 0.5 seconds when possible
- Query specificity: ensure the segment clearly matches the query
- Visual descriptions must report what you see, not background knowledge
- Complete coverage: mark all matching segments, not just the first
- Confidence levels: High (within 0.5s), Medium (within 1–2s), Low (ambiguous)
Common Mistakes To Avoid
Avoid marking segments that are too short or too long, using vague descriptions, or adding non-visual assumptions. Keep descriptions concrete and action-oriented.
- Do not use abstract adjectives like "delicious" — describe visual cues instead
- Avoid including unrelated footage before or after the action
- Do not miss additional occurrences of the same action later in the video
- Do not invent objects or intents that are not visible on screen
Requirements
Preserve every listed requirement: this project requires experience with data labeling and a graduate degree, plus a strong understanding of food and cooking videos. The client accepts entry-level contributors, but the graduate-degree requirement is mandatory.
You must be comfortable using Label Studio, following written annotation guides, and working as an independent contractor. There are no mandated hours—work is flexible and on-demand.
- Education: Graduate degree (required)
- Experience: Data labeling experience (required)
- Domain knowledge: Familiarity with food/cooking video content (required)
- Tools: Experience or willingness to use Label Studio
- Work arrangement: Contractor, remote, flexible schedule
How It Works & How to Apply
Create an OpenTrain account to apply, build your profile, and start. You will receive tasks through Label Studio, complete annotations following this guide, and get paid per label at the stated rate.
Because this is paid per label, throughput varies by task length and complexity. Follow quality standards to maximize accepted labels and earning potential.
- Sign up on OpenTrain and complete any qualification tests required by the project
- Work remotely and submit annotations in Label Studio
- Payment is per accepted label at $0.05; contractor agreement applies