AI Data Labeling & Evaluation for Large Language Models (LLMs)
Worked on large-scale AI data labeling and evaluation projects focused on improving LLM accuracy, safety, and response quality. Responsibilities included evaluating AI-generated responses for factual accuracy, relevance, tone, and policy compliance across diverse prompts. Performed prompt-response writing for supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and red-teaming scenarios to identify edge cases, hallucinations, and unsafe outputs. Contributed to high-volume annotation tasks while adhering to strict quality guidelines, reviewer rubrics, and consistency benchmarks, maintaining high accuracy and quality scores across assignments.