LLM Evaluation & Rubric-Based Data Annotation
Worked on diverse LLM data annotation and evaluation projects, focusing on quality, fairness, and accuracy of AI outputs. Tasks included analyzing thousands of prompt–response pairs, labeling for correctness, coherence, and policy alignment, as well as refining prompts and rubrics to strengthen model training. Supported red-teaming initiatives by stress-testing models with sensitive or adversarial inputs. Contributed to fine-tuning and reinforcement learning datasets, ensuring they met high standards of reliability and usability. Consistently delivered error-free annotations that improved model performance and enhanced dataset trustworthiness.