Pod Lead - LLM Python Developer (AI Model Evaluation & Training)
Served as Pod Lead managing AI model evaluation teams for enterprise projects, overseeing the benchmarking and functional assessment of code-generation capabilities. Conducted structured side-by-side (SxS) model evaluations, verifying correctness, prompt adherence, and quality through detailed justification frameworks. Maintained high annotation and review standards under RLHF pipelines to ensure evaluation integrity and deliver constructive feedback cycles. • Designed and implemented multimodal AI training datasets and code-based testcases to improve large language model (LLM) capabilities. • Evaluated AI-generated code using command-line, sandboxed (Colab-style) execution environments for correctness and reproducibility. • Delivered agentic code validation and proof-of-work review, identifying and refining model behavior through multi-turn prompts. • Applied rigorous QA processes across Python/Django, PHP/Laravel, Dart/Flutter, SQL, and web stacks in multiple confidential settings.