AI Data Labeller — Atlas (RLHF Feedback/Response Ranking)
Performed Reinforcement Learning from Human Feedback (RLHF) annotation by rating and ranking AI-generated responses. Evaluated responses on helpfulness, harmlessness, and honesty to generate training data for large language model alignment. • Rated responses to capture quality and safety alignment signals. • Ranked multiple responses to support preference learning. • Applied safety/policy considerations when flagging non-compliant content. • Fed curated feedback into reward model training workflows.