LLM Quality Assurance and Safety Evaluation
Involved in a continuous, high-volume project focused on enhancing the safety and factual grounding of a large language model (LLM). My core responsibility was performing prompt-response evaluation and side-by-side comparative rating of model outputs, strictly adhering to complex ethical and linguistic guidelines. Tasks included identifying and rating model hallucinations, bias, and potential safety risks (Red Teaming concept). I also contributed to prompt and response refinement (SFT) to improve Arabic and English conversational flow and accuracy. The work required an analytical, quality-control mindset to maintain an output accuracy rate exceeding 95%."