For employers

Hire this AI Trainer

Garland C.

AI Quality & Evaluation Specialist (Contract) — Independent AI Evaluation Projects (Remote)

Atlanta, Usa

Key Skills

Software

Other

AWS SageMaker

Top Subject Matter

Healthcare and general-domain LLM evaluation

Clinical NLP and healthcare AI evaluation

Healthcare AI evaluation and ICD-10 coding automation

Top Data Types

Text

Document

Top Task Types

Classification

RLHF

Entity (NER) Classification

Tracking

Freelancer Overview

AI Quality & Evaluation Specialist (Contract) — Independent AI Evaluation Projects (Remote). Brings 7+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Other and AWS SageMaker. Education includes Doctor of Philosophy, Georgia Institute of Technology (2026) and Master of Science, Georgia Institute of Technology (2026). AI-training focus includes data types such as Text, Medical, and DICOM and labeling workflows including Evaluation, Rating, and Fine Tuning.

Labeling Experience

Graduate Research Assistant (AI & Clinical NLP) - Georgia Institute of Technology

AWS SageMaker

Text

Classification

RLHF

Worked as a graduate research assistant developing LLM evaluation pipelines for clinical question-answering and medical AI assistant use cases. Created multi-stage prompt evaluation frameworks grounded in established clinical guidelines and evidence-based practices. Measured hallucination rates, calibration reliability, reasoning consistency, and fairness characteristics across healthcare datasets while collaborating on AI safety and compliance efforts. • Designed and benchmarked LLM evaluation pipelines for healthcare NLP. • Developed prompt evaluation frameworks using CDC, AHA, and clinical guidance. • Performed bias detection and fairness analysis in AI-generated clinical summaries and triage outputs. • Supported research on RAG, human-in-the-loop evaluation, and AI safety/compliance initiatives.

2024 - Present

Graduate Research Assistant — AI & Clinical NLP Lab (Georgia Institute of Technology)

Other

Text

Designed and benchmarked LLM evaluation pipelines for clinical question-answering systems and medical AI assistants. Developed multi-stage prompt evaluation frameworks grounded in CDC, AHA, and evidence-based clinical guidelines to assess reliability and reasoning quality. Conducted hallucination-rate, calibration, and bias/fairness analyses using healthcare datasets, including contributions to human-in-the-loop evaluation systems. • Evaluation of hallucination rates and calibration reliability • Reasoning consistency scoring on healthcare datasets • Bias detection and fairness analysis for clinical summaries and triage • Support for RAG and human-in-the-loop AI evaluation workflows

2024 - Present

AI Quality & Evaluation Specialist (Contract) - Independent AI Evaluation Projects

AWS SageMaker

Provided AI quality evaluation and reliability testing for a wide range of prompts, including medical, technical, and general-domain content. Assessed factual accuracy, coherence, reasoning quality, hallucination risk, bias patterns, and safety compliance for model outputs. Designed evaluation structure that supports robust quality assurance and model improvement workflows across different LLMs. • Evaluated thousands of AI-generated responses and documented performance trends. • Built annotation rubrics, evaluation guidelines, and quality scoring systems. • Conducted prompt-response audits and side-by-side model comparisons. • Contributed to structured feedback datasets to support prompt optimization and fine-tuning initiatives.

2022 - Present

AI Quality & Evaluation Specialist (Contract) — Independent AI Evaluation Projects (Remote)

Other

Text

Evaluated thousands of AI-generated responses across medical, technical, and general-domain prompts for quality and safety attributes. Built annotation rubrics and quality scoring systems to support LLM response benchmarking and reinforcement learning workflows. Performed prompt-response audits to identify hallucinations, logical inconsistencies, bias patterns, and unsafe outputs, then documented failure analyses in written reports. • LLM response factual accuracy and coherence scoring • Safety compliance and unsafe output identification • Bias detection and hallucination risk assessment • Side-by-side model comparisons to measure evaluation outcomes

2022 - Present

Research Intern (Health AI) - IBM Research

AWS SageMaker

Text

Classification

Tracking

Contributed to healthcare AI evaluation efforts by building Python preprocessing pipelines for model assessment and ICD-10 coding automation. Developed performance dashboards to analyze model outputs, demographic disparities, and classification accuracy. Assisted with evaluating transformer-based NLP systems for consistency, fairness, and operational performance, then communicated results to engineering and clinical stakeholders. • Implemented preprocessing pipelines to support healthcare AI evaluation workflows. • Built dashboards tracking demographic disparities and classification accuracy metrics. • Evaluated transformer NLP systems for fairness, consistency, and performance. • Presented evaluation findings and optimization recommendations to teams.

2023 - 2023

Education

Georgia Institute of Technology

Master of Science, Computer Science

Master of Science

2024 - 2026

University of West Georgia

Bachelor of Science, Computer Science

Bachelor of Science

2020 - 2024

Work History

Georgia Institute of Technology

Graduate Research Assistant (AI & Clinical NLP)

Atlanta

2024 - Present

Independent AI Evaluation Projects

AI Quality & Evaluation Specialist (Contract)

Atlanta

2022 - Present