AI Evaluation Task Designer

Design and refine rubric-based evaluation tasks that test AI agent behavior, document outcomes in clear English, and improve scoring methods. Contractor, part-time (20+ hrs/week), remote — pay $20–$35/hr; fluency in English and strong written precision required.

Generative Ai Rlhf

100% Remote Hourly · $20–$35/hr

$20–$35/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Jun 28, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We help freelancers discover projects, consolidate opportunities across platforms, and build a unified AI training portfolio they control.

Working with OpenTrain means joining the human side of AI development: contributors design, assess, and refine the examples that modern models learn from — an accessible, flexible way to work on cutting-edge technology remotely.

About AI training and evaluation work

AI training (data labeling and human feedback) is the set of tasks people do to teach and shape AI systems. Evaluation work focuses on creating tests, rubrics, and scenarios that reveal how an AI performs in real-world or administrative workflows.

This role centers on clarity, repeatability, and defensible scoring: good evaluations produce unambiguous pass/fail criteria, concise reports, and iterative improvements to the evaluation process.

The role

You will be an independent contractor designing and refining evaluation tasks for AI agents. The work emphasizes precise English writing, structured thinking, and practical use of common SaaS tools, browsers, and document editors.

This is a part-time contractor role requiring 20+ hours per week. Open to applicants worldwide who are fluent in English.

Employment type: Contractor, Part-time.
Time requirement: 20+ hours/week.
Language: Native or fluent English required.
Data type: Text-based evaluation tasks; label types include evaluation ratings and RLHF.

What you'll do

You will build self-contained evaluation tasks, define success/failure criteria, observe agent behavior, and produce concise summaries and reports. You will also iterate on rubrics and adapt evaluation frameworks as projects evolve.

Create clear prompts, supporting files, and grading rubrics for practical computer-based workflows.
Define unambiguous success and failure criteria for administrative and workflow scenarios.
Observe AI agent behavior, document outcomes, and write concise reports in precise English.
Refine rubrics and scoring methods based on feedback and team collaboration.
Adapt evaluation frameworks to different domains and evolving project requirements.

Requirements

All required skills and qualifications come from the role description. Candidates must demonstrate strong written precision, structured thinking, and the ability to work independently on ambiguous tasks.

Strong written precision and structured thinking.
Native or fluent English writing ability.
Experience designing or applying rubric-based evaluation or scoring frameworks.
Attention to detail and careful observation skills.
Comfort using computers, SaaS tools, web browsers, file management, and document editing platforms.
Ability to work independently on ambiguous projects.

Helpful background

The following backgrounds are not required but will help you succeed and stand out when applying.

Prior experience evaluating AI outputs, RLHF, or model behavior.
Experience refining rubrics or scoring methodologies in quality assurance or evaluation roles.
Exposure to technology-driven process improvement projects or operational workflows.

Compensation, tools, and next steps

Pay is hourly: $20–$35 USD per hour, with a common rate listed at $35/hr. You will work as a contractor, scheduling hours to meet the 20+ hour/week expectation. Projects use text-based evaluation; labeling software is listed as Other, and specific tooling will be shared per project.

To apply, prepare examples of rubric design, short evaluation summaries you’ve written, or notes on process improvements you’ve led. OpenTrain will provide project-specific onboarding and evaluation criteria once hired.

Compensation: USD $20–$35 per hour (PAY_PER_HOUR).
Label types: Evaluation rating, RLHF; data type: Text.
— project tooling provided as needed.
Worldwide applicants accepted; must be fluent in English.