Cypher Evals – LLM Evaluation & Rubric-Based Text Assessment
Participated in the Cypher Evals project focused on evaluating large language model (LLM) outputs using rubric-based frameworks. My tasks involved analyzing prompts and comparing AI-generated responses across multiple dimensions, including instruction following, truthfulness, response length, structure, tone, and harmlessness. Each task required identifying key points within the rubric, applying consistent scoring, and writing clear justifications supported by textual evidence. The project enhanced my expertise in linguistic analysis for both Modern Standard Arabic and Egyptian dialects, as well as my ability to deliver accurate, high-quality evaluations. Worked with hundreds of prompt–response pairs, ensuring reliable assessment and detailed reporting to improve model alignment and fine-tuning accuracy.