Generative AI Output Evaluation (RLHF & Reasoning)
I evaluated complex Generative AI responses for factual accuracy, reasoning logic, and safety compliance. The work involved "Human-in-the-Loop" (HITL) review where I identified subtle hallucinations, graded the helpfulness of responses, and wrote original "Golden" responses to train the model. This role required deep research to verify facts and a strong understanding of prompt engineering to test model limitations.