OpenTrain Blog | AI Training and Data Labeling

Card thumbnail illustration of AI red teaming findings organized into evaluation data, with centered title text and a horizontal divider.

Red teamingJun 10, 202612 min read

AI Red Teaming as an Evaluation Data Problem

AI red teaming is useful when adversarial findings become reproducible evaluation data: threat models, rubrics, adjudication, leakage controls, and routing decisions.

Read article

Abstract feedback signal nested inside a larger measurement field for process and outcome reward modeling.

Evaluation systemsJun 9, 202610 min read

Process Reward Models vs Outcome Reward Models for Reasoning Systems

A technical reference on process versus outcome reward models, verifier reliability, benchmark transfer, reward hacking, and hybrid supervision.

Read article

Abstract blurred measurement field for GRPO reasoning-model post-training.

Post-trainingJun 8, 202611 min read

GRPO for Reasoning-Model Post-Training

A technical reference on what GRPO changes, what it does not measure, and why verifier quality, pass@k, contamination control, and human-audited slices still matter.

Read article

Abstract frosted-glass calibration field for RLAIF vs RLHF.

Post-trainingJun 4, 20269 min read

RLAIF vs RLHF: What AI Feedback Can and Cannot Replace

Where AI feedback can scale post-training supervision, and where human-grounded objectives, calibration, expert review, and holdouts remain essential.

Read article

Abstract blurred measurement envelope surrounding a smaller optimization path.

Post-trainingJun 3, 20269 min read

Direct Preference Optimization vs PPO after RLHF

A technical reference on what DPO changes after RLHF, where PPO and online data still matter, and why preference measurement remains the hard part.

Read article

Abstract blurred measurement surfaces with colored calibration light.

Evaluation systemsJun 1, 20268 min read

LLM Judges Are Measurement Systems, Not Oracles

Evidence-based technical reference on when LLM judges are reliable enough for production evals and post-training, and how to calibrate, audit, and gate them.

Read article

Abstract wave visualization representing RLHF preference data

Operating guidesMay 22, 20267 min read

How to scope an RLHF data program

A practical framework for launching an RLHF program: define queue geometry, size raters from observed throughput, budget the review loop, and run weekly refresh gates.

Read article

Technical notes for AI teams hiring human feedback experts

AI Red Teaming as an Evaluation Data Problem

All posts

AI Red Teaming as an Evaluation Data Problem

Process Reward Models vs Outcome Reward Models for Reasoning Systems

GRPO for Reasoning-Model Post-Training

RLAIF vs RLHF: What AI Feedback Can and Cannot Replace

Direct Preference Optimization vs PPO after RLHF

LLM Judges Are Measurement Systems, Not Oracles

How to scope an RLHF data program