Bilingual (Japanese/English) AI Safety Data Reviewer
Remote contract role reviewing AI-generated content for safety, correctness, and reasoning in Japanese and English — $27–$31/hr, 20+ hours/week. Use senior trust & safety and red‑teaming experience to rate, compare, and improve model outputs.
Generative AI & RLHF
$27–$31/hr
Compensation
Worldwide
Eligibility
Intermediate
Experience
Apr 3, 2026
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect people to projects that teach and shape AI behavior — from annotating text and audio to evaluating model safety — and help contributors build profiles and apply quickly.
About AI training and safety work
AI training (data labeling / human feedback) is the human side of building modern AI. Reviewers and evaluators supply the careful judgments models learn from — assessing accuracy, policy alignment, and cultural nuance so systems behave safely in the real world.
This role focuses on generative AI safety: evaluating model reasoning, identifying adversarial or unsafe outputs, and recommending mitigations so models avoid producing harmful or misleading content.
The role
We are hiring a remote, hourly-paid contractor to review AI-generated content in Japanese and English, evaluate step-by-step reasoning, and make nuanced safety judgments. This is a part-time contract role requiring 20+ hours per week.
Pay: USD $30/hr (typical), with an allowable range of $27–$31/hr. Work is fully remote and worldwide; you may be asked to handle explicit, violent, or otherwise disturbing content in a secure environment.
- Employment type: Contractor, Part-time
- Time commitment: 20+ hours/week
- Compensation: $27–$31 USD per hour (typical $30/hr)
What you'll do
You will evaluate model outputs for correctness, clarity, and safety; compare multiple responses; and create clear, reproducible rationales for ratings and remediation recommendations.
- Assess problem-solving and step-by-step reasoning for accuracy and logical coherence across Japanese and English content.
- Spot methodological, conceptual, or factual errors and fact-check when necessary.
- Rate or compare multiple model responses by safety, policy alignment, and user impact.
- Identify adversarial edge cases, perform red‑teaming style testing, and recommend concrete mitigations.
- Document decisions with clear, reproducible rationales suitable for policy and model-improvement teams.
Requirements (must-have)
This role requires strong bilingual skills, senior trust & safety experience, and demonstrated LLM red‑teaming or adversarial testing capability.
- Near-native or native proficiency in written Japanese.
- Minimum C1 proficiency in written English.
- Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience.
- Senior-level experience in Trust & Safety, content moderation, policy operations, risk/compliance, investigations, or related safety functions.
- Proven LLM red‑teaming or adversarial testing experience, including identifying edge cases and recommending mitigations.
- Strong knowledge of safety domains: hate/harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activities and code, and misinformation.
- Experience applying policy standards consistently across Japanese and English, including cultural nuance, slang, coded language, and context shifts.
- Strong analytical writing with clear, reproducible rationales for moderation or safety decisions.
- Comfortable handling explicit, toxic, violent, sexual, or psychologically disturbing content in a secure remote work environment.
Preferred qualifications
These qualifications are helpful but not strictly required.
- Localization or translation experience, with ability to preserve meaning, severity, and intent across languages.
- Experience working with evaluation or annotation tools and producing structured feedback for model teams.
How selection and work will proceed
Applicants will complete a language and moderation exercise to demonstrate judgment across Japanese and English. Selected reviewers receive task instructions, policy guides, and secure access to the annotation environment.
Your feedback will directly influence model safety: ratings and red‑teaming notes feed into policy updates and model improvements.
- Label types: evaluation ratings and text-generation review.
- / proprietary tool provided by the project.
- Worldwide applicants accepted; contractors must follow secure handling instructions for sensitive content.