AI Model Evaluator
Evaluated large language model (LLM) outputs across reasoning, coding, instruction-following, and factuality tasks using structured assessment frameworks. Provided expert human feedback, preference ranking, and quality annotations to support model post-training, alignment, and evaluation workflows. Identified edge cases, hallucinations, and failure modes through systematic analysis of model responses and benchmark-driven evaluation. • Performed LLM output evaluation and structured scoring • Conducted preference ranking and quality annotations • Reviewed responses for hallucinations and failure modes • Supported post-training alignment and evaluation processes