AI Task Designer & Prompt Engineering Specialist
Worked on multiple AI training projects as both an attempter and reviewer, contributing to the full human-in-the-loop evaluation cycle for large language models. As an attempter, designed high-quality prompts aligned with predefined metadata (domain, intent, difficulty, and language) and produced structured, step-by-step attempts for coding and web navigation tasks, ensuring realistic and testable scenarios. As a reviewer, developed detailed evaluation rubrics with objective criteria covering correctness, code quality, scalability, completeness, and clarity. Evaluated AI-generated responses and human attempts by inspecting reasoning steps, executing code in virtual machine environments when necessary, and validating web navigation behavior in browser-based environments. Ranked responses from best to worst with clear technical justification, identifying edge cases, failure patterns, and reasoning gaps to support continuous model improvement.