Skip to content
OpenTrain AIFor AI Companies

Technical notes for AI teams hiring human feedback experts

Practical reads on RLHF, evaluation data, red teaming, reward models, and the operating work behind reliable human feedback programs.

Who are you?

All posts

Browse OpenTrain posts by audience and topic. Current coverage starts with RLHF, evaluation data, red teaming, reward models, and project scoping.

7 posts for AI Builders

Card thumbnail illustration of AI red teaming findings organized into evaluation data, with centered title text and a horizontal divider.
Red teamingJun 10, 202612 min read

AI Red Teaming as an Evaluation Data Problem

AI red teaming is useful when adversarial findings become reproducible evaluation data: threat models, rubrics, adjudication, leakage controls, and routing decisions.

Read article
Abstract feedback signal nested inside a larger measurement field for process and outcome reward modeling.
Evaluation systemsJun 9, 202610 min read

Process Reward Models vs Outcome Reward Models for Reasoning Systems

A technical reference on process versus outcome reward models, verifier reliability, benchmark transfer, reward hacking, and hybrid supervision.

Read article
Abstract blurred measurement field for GRPO reasoning-model post-training.
Post-trainingJun 8, 202611 min read

GRPO for Reasoning-Model Post-Training

A technical reference on what GRPO changes, what it does not measure, and why verifier quality, pass@k, contamination control, and human-audited slices still matter.

Read article
Abstract frosted-glass calibration field for RLAIF vs RLHF.
Post-trainingJun 4, 20269 min read

RLAIF vs RLHF: What AI Feedback Can and Cannot Replace

Where AI feedback can scale post-training supervision, and where human-grounded objectives, calibration, expert review, and holdouts remain essential.

Read article
Abstract blurred measurement envelope surrounding a smaller optimization path.
Post-trainingJun 3, 20269 min read

Direct Preference Optimization vs PPO after RLHF

A technical reference on what DPO changes after RLHF, where PPO and online data still matter, and why preference measurement remains the hard part.

Read article
Abstract blurred measurement surfaces with colored calibration light.
Evaluation systemsJun 1, 20268 min read

LLM Judges Are Measurement Systems, Not Oracles

Evidence-based technical reference on when LLM judges are reliable enough for production evals and post-training, and how to calibrate, audit, and gate them.

Read article
Abstract wave visualization representing RLHF preference data
Operating guidesMay 22, 20267 min read

How to scope an RLHF data program

A practical framework for launching an RLHF program: define queue geometry, size raters from observed throughput, budget the review loop, and run weekly refresh gates.

Read article