LLM Evaluation and Conversational Text Annotation Project
This project involved reviewing AI generated responses, human conversational text, prompts and model outputs to evaluate clarity, correctness, coherence, safety and relevance. I performed content classification, intent identification, sentiment interpretation, rewriting/improvement of model responses and quality scoring based on task instructions. The project required strict guideline adherence, detailed reasoning consistency, and maintaining high accuracy across multiple annotation batches. I mainly worked on English data and contributed to improving reasoning depth, factual alignment, and user experience quality in LLM interactions. Approx project size: Over 250+ tasks completed across multiple batches.