Technical notes for AI teams hiring human feedback experts
Practical reads on RLHF, evaluation data, red teaming, reward models, and the operating work behind reliable human feedback programs.
Who are you?
Choose the posts most relevant to your work.
All posts
Browse OpenTrain posts by audience and topic. Current coverage starts with RLHF, evaluation data, red teaming, reward models, and project scoping.
Topics
7 posts for AI Builders
AI Red Teaming as an Evaluation Data Problem
AI red teaming is useful when adversarial findings become reproducible evaluation data: threat models, rubrics, adjudication, leakage controls, and routing decisions.
Process Reward Models vs Outcome Reward Models for Reasoning Systems
A technical reference on process versus outcome reward models, verifier reliability, benchmark transfer, reward hacking, and hybrid supervision.
GRPO for Reasoning-Model Post-Training
A technical reference on what GRPO changes, what it does not measure, and why verifier quality, pass@k, contamination control, and human-audited slices still matter.
RLAIF vs RLHF: What AI Feedback Can and Cannot Replace
Where AI feedback can scale post-training supervision, and where human-grounded objectives, calibration, expert review, and holdouts remain essential.
Direct Preference Optimization vs PPO after RLHF
A technical reference on what DPO changes after RLHF, where PPO and online data still matter, and why preference measurement remains the hard part.
LLM Judges Are Measurement Systems, Not Oracles
Evidence-based technical reference on when LLM judges are reliable enough for production evals and post-training, and how to calibrate, audit, and gate them.
How to scope an RLHF data program
A practical framework for launching an RLHF program: define queue geometry, size raters from observed throughput, budget the review loop, and run weekly refresh gates.
No posts match AI Builders with the current search.