Skip to content
OpenTrain AI

Evaluation Scenario Writer - AI Agent Testing Specialist

OpenTrain AI · Remote · Worldwide · Posted Jun 10, 2026

Apply for this job Hourly · $18–$24/hr

We’re looking for an analytical scenario writer with strong QA-style thinking and excellent written English. You should be comfortable designing structured evaluation scenarios, defining expected (“gold standard”) agent behavior, and working with structured formats like JSON/YAML. A background in software testing, QA, data analysis, or NLP annotation is strongly preferred. Basic Python and JavaScript experience is required.

What you’ll be doing:
You’ll design realistic, reusable evaluation scenarios for LLM-based agents that simulate real-world tasks. You’ll define the golden path and acceptable behaviors, annotate task steps and expected outputs, and document edge cases and scoring logic. You’ll also review agent outputs, iterate on scenarios for clarity and coverage, and collaborate with developers and other contributors to test and refine evaluation frameworks.