AI QA Engineer — LLM Evaluation And Testing

Contract AI QA Engineer to evaluate LLM outputs and test AI-powered apps; BA/BS in CS or related field required. Remote, English required; fixed-price $100 listed on OpenTrain (description also notes $20–$50/hr).

Generative Ai Rlhf

100% Remote Fixed price · $100

$100 fixed price

Compensation

Worldwide

Eligibility

Intermediate

Experience

Jul 2, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Create a free account, build a profile that showcases your skills, and apply to AI projects in minutes.

OpenTrain connects people with hands-on roles that shape how real AI systems behave — from annotating data to evaluating model responses — and helps contributors grow in a fast-moving industry.

About AI training and LLM evaluation

AI training (data labeling and evaluation) is the human work behind modern models. For LLMs this often means reviewing model outputs for accuracy, relevance, safety, and instruction following so models improve through iterative feedback.

These roles are typically remote, flexible, and accessible: many projects require only attention to detail, good English, and the ability to follow guidelines, while specialized work rewards domain knowledge.

You will help shape how LLMs respond to prompts by providing consistent, well-reasoned evaluations.
Work is often contractor-based and can fit around other commitments.

The role — AI QA Engineer

We are hiring an AI QA Engineer to test AI-powered web and mobile applications and evaluate LLM-generated responses. You will design and run manual test cases, validate prompts and workflows, identify defects, and provide clear feedback to product and engineering teams.

This is a contractor role open worldwide; English reading and writing are required. The project is listed on OpenTrain as a fixed-price contract for $100 (the job description also lists a salary range of $20–$50/hour).

Work type: Contractor (remote, worldwide).
Data type you'll evaluate: text (label type: EVALUATION_RATING).

What you'll do

Daily tasks focus on hands-on testing and evaluation to ensure AI features meet quality standards.

Test AI-enabled web and mobile features and functional flows.
Evaluate LLM responses for accuracy, relevance, safety, and instruction following.
Create and execute manual test cases, including functional, regression, and exploratory testing.
Report bugs with clear reproduction steps and collaborate with engineers to clarify issues.
Validate prompts and end-to-end AI workflows and provide actionable feedback.

Requirements

Candidates must meet the core qualifications below and follow detailed guidelines when evaluating outputs.

BA/BS (or MS) in Computer Science, Software Engineering, or a related field (required).
Experience in software QA (manual or automation) and knowledge of SDLC and STLC.
Strong analytical and problem-solving skills and excellent written English.
Ability to evaluate AI outputs for accuracy, relevance, safety, and instruction following.
Able to follow detailed guidelines, provide consistent evaluations, and explain reasoning clearly.

Preferred skills

These skills are not mandatory but will make you a stronger candidate and may be used when assigning tasks.

Familiarity with ChatGPT, Gemini, Claude, or other LLMs.
Experience with Jira, TestRail, or similar QA tools.
Prompt engineering and AI model evaluation experience.
LLM testing, REST API testing, and basic SQL knowledge.
Experience working in Agile/Scrum teams.

Compensation, schedule, and logistics

The job description lists a salary range of $20–$50/hour based on experience; on OpenTrain this project is posted as a fixed-price contract for $100 total. Specific payment timing and milestones will be managed through the project posting.

Time commitment and weekly hours are not specified in the listing; this contractor role is remote and open to applicants worldwide who meet the language and qualification requirements.

Payment: fixed-price $100 (project listing); description also shows $20–$50/hr.
Location: Remote — worldwide applicants welcome.
Language: English required.

How to apply

Sign in to OpenTrain (creating an account is free), complete your profile with relevant experience, and submit an application for this contract.

Highlight your degree, QA experience, familiarity with LLMs or prompt work, and examples of prior testing or evaluation if available. Follow the project instructions on OpenTrain and provide clear, concise responses during any qualification tasks.