Evaluation Scenario Writer - AI Agent Testing Specialist
OpenTrain AI · Remote · Worldwide · Posted Jun 10, 2026
We’re looking for an analytical scenario writer with strong QA-style thinking and excellent written English. You should be comfortable designing structured evaluation scenarios, defining expected (“gold standard”) agent behavior, and working with structured formats like JSON/YAML. A background in software testing, QA, data analysis, or NLP annotation is strongly preferred. Basic Python and JavaScript experience is required.
What you’ll be doing:
You’ll design realistic, reusable evaluation scenarios for LLM-based agents that simulate real-world tasks. You’ll define the golden path and acceptable behaviors, annotate task steps and expected outputs, and document edge cases and scoring logic. You’ll also review agent outputs, iterate on scenarios for clarity and coverage, and collaborate with developers and other contributors to test and refine evaluation frameworks.