For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
R
Ryan Sawyer

Ryan Sawyer

PhD AI Evaluator | LLM Quality & Bias Detection Specialist

USA flagSeattle, Usa
$30.00/hrExpertClickworkerData Annotation TechLabelbox

Key Skills

Software

ClickworkerClickworker
Data Annotation TechData Annotation Tech
LabelboxLabelbox
MercorMercor
MindriftMindrift
OneFormaOneForma
RemotasksRemotasks
Scale AIScale AI
Snorkel AISnorkel AI
Surge AISurge AI
TelusTelus
AppenAppen

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code ProgrammingComputer Code Programming
DocumentDocument
TextText

Top Task Types

Computer Programming/CodingComputer Programming/Coding
Fine-tuningFine-tuning
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
Text GenerationText Generation
Text SummarizationText Summarization

Freelancer Overview

I am a PhD-trained AI Evaluator with over 5 years of experience in data labeling, LLM evaluation, and training dataset development across leading platforms including Surge AI, Outlier, Appen, TELUS International, and RWS. My work spans 5,000+ LLM prompts and 10,000+ annotated entries, with a focus on truthfulness, bias detection, rubric-based evaluation, and quality assurance. I specialize in creating and applying structured rubrics to ensure data accuracy, fairness, and clarity, reducing project error rates by up to 20% while improving model reliability. In addition to strong annotation expertise, I bring skills in prompt engineering, guideline adaptation, and cross-functional collaboration, ensuring datasets align with evolving client needs. I am proficient in Python, SQL, LabelStudio, and Prodigy, with experience applying data-driven insights to improve workflows and training efficiency. My combined background in academic teaching, rubric design, and multilingual data evaluation positions me to deliver consistent, high-quality contributions to AI training projects.

ExpertEnglish

Labeling Experience

Appen

LLM Evaluation & Rubric-Based Data Annotation

AppenTextClassificationText Summarization
Worked on diverse LLM data annotation and evaluation projects, focusing on quality, fairness, and accuracy of AI outputs. Tasks included analyzing thousands of prompt–response pairs, labeling for correctness, coherence, and policy alignment, as well as refining prompts and rubrics to strengthen model training. Supported red-teaming initiatives by stress-testing models with sensitive or adversarial inputs. Contributed to fine-tuning and reinforcement learning datasets, ensuring they met high standards of reliability and usability. Consistently delivered error-free annotations that improved model performance and enhanced dataset trustworthiness.

Worked on diverse LLM data annotation and evaluation projects, focusing on quality, fairness, and accuracy of AI outputs. Tasks included analyzing thousands of prompt–response pairs, labeling for correctness, coherence, and policy alignment, as well as refining prompts and rubrics to strengthen model training. Supported red-teaming initiatives by stress-testing models with sensitive or adversarial inputs. Contributed to fine-tuning and reinforcement learning datasets, ensuring they met high standards of reliability and usability. Consistently delivered error-free annotations that improved model performance and enhanced dataset trustworthiness.

2024
Scale AI

LLM Output Evaluation & Rubric-Based Annotation

Scale AITextClassificationText Generation
Reviewed and evaluated 5,000+ AI-generated prompts and responses across multiple projects, focusing on accuracy, clarity, fairness, and compliance with client guidelines. Designed and applied rubrics to identify factual inaccuracies, logical inconsistencies, and policy violations. Collaborated with QA teams to ensure 98% rubric compliance and reduced labeling errors by 15–20%. Tasks included red-teaming sensitive prompts, prompt engineering, and fine-tuning evaluations to support safe, reliable AI system development.

Reviewed and evaluated 5,000+ AI-generated prompts and responses across multiple projects, focusing on accuracy, clarity, fairness, and compliance with client guidelines. Designed and applied rubrics to identify factual inaccuracies, logical inconsistencies, and policy violations. Collaborated with QA teams to ensure 98% rubric compliance and reduced labeling errors by 15–20%. Tasks included red-teaming sensitive prompts, prompt engineering, and fine-tuning evaluations to support safe, reliable AI system development.

2023
Surge AI

Surge AI (LLM Output Evaluation)LLM Output Evaluation & Rubric-Based Annotation – Surge AI

Surge AITextText GenerationText Summarization
Reviewed and scored thousands of large language model (LLM) outputs for factual accuracy, clarity, bias, and policy compliance using detailed rubrics. Collaborated with QA teams to refine evaluation guidelines and reduce inconsistencies. Improved dataset reliability by ensuring >95% adherence to quality standards.

Reviewed and scored thousands of large language model (LLM) outputs for factual accuracy, clarity, bias, and policy compliance using detailed rubrics. Collaborated with QA teams to refine evaluation guidelines and reduce inconsistencies. Improved dataset reliability by ensuring >95% adherence to quality standards.

2023
Telus

LLM Output Quality & Bias Evaluation

TelusTextQuestion AnsweringText Generation
Worked on large-scale projects evaluating AI-generated responses across domains including science, education, and general knowledge. Focused on detecting subtle factual inaccuracies, logical inconsistencies, and ethical risks in outputs. Designed structured rubrics that balanced accuracy, safety, and user experience, and contributed to reinforcement learning with human feedback (RLHF) datasets. Completed 4,000+ high-quality annotations, with QA reviews confirming >97% accuracy. Collaborated with cross-functional teams to refine evaluation criteria and improve dataset consistency.

Worked on large-scale projects evaluating AI-generated responses across domains including science, education, and general knowledge. Focused on detecting subtle factual inaccuracies, logical inconsistencies, and ethical risks in outputs. Designed structured rubrics that balanced accuracy, safety, and user experience, and contributed to reinforcement learning with human feedback (RLHF) datasets. Completed 4,000+ high-quality annotations, with QA reviews confirming >97% accuracy. Collaborated with cross-functional teams to refine evaluation criteria and improve dataset consistency.

2022 - 2022
Labelbox

AI Model Evaluation & Fine-Tuning Support

LabelboxTextQuestion AnsweringText Summarization
Worked on an AI training program aimed at improving factual reliability and alignment of LLM responses. Tasks included reviewing 3,500+ model outputs, identifying subtle inaccuracies, policy breaches, and ethical risks. Designed context-specific rubrics to evaluate long-form explanations, summaries, and Q&A tasks. Assisted with data preparation for fine-tuning and conducted red-teaming on sensitive prompts to test system robustness. Achieved a high QA pass rate (99%), while also suggesting refinements to annotation guidelines that improved evaluator consistency across the project.

Worked on an AI training program aimed at improving factual reliability and alignment of LLM responses. Tasks included reviewing 3,500+ model outputs, identifying subtle inaccuracies, policy breaches, and ethical risks. Designed context-specific rubrics to evaluate long-form explanations, summaries, and Q&A tasks. Assisted with data preparation for fine-tuning and conducted red-teaming on sensitive prompts to test system robustness. Achieved a high QA pass rate (99%), while also suggesting refinements to annotation guidelines that improved evaluator consistency across the project.

2021 - 2022

Education

U

University of Washington

PhD, Computer Science

PhD
2022 - 2024
U

University of Washington

Master of Science, Computer Science

Master of Science
2019 - 2022

Work History

U

University of Washington

Teaching Assistant, Computer Science Department

Seattle
2019 - 2021