Bilingual (Japanese/English) AI Safety Data Reviewer

Remote contract role reviewing AI-generated content for safety, correctness, and reasoning in Japanese and English — $27–$31/hr, 20+ hours/week. Use senior trust & safety and red‑teaming experience to rate, compare, and improve model outputs.

Generative AI & RLHF

100% Remote Hourly · $27–$31/hr

$27–$31/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Apr 3, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect people to projects that teach and shape AI behavior — from annotating text and audio to evaluating model safety — and help contributors build profiles and apply quickly.

About AI training and safety work

AI training (data labeling / human feedback) is the human side of building modern AI. Reviewers and evaluators supply the careful judgments models learn from — assessing accuracy, policy alignment, and cultural nuance so systems behave safely in the real world.

This role focuses on generative AI safety: evaluating model reasoning, identifying adversarial or unsafe outputs, and recommending mitigations so models avoid producing harmful or misleading content.

The role

We are hiring a remote, hourly-paid contractor to review AI-generated content in Japanese and English, evaluate step-by-step reasoning, and make nuanced safety judgments. This is a part-time contract role requiring 20+ hours per week.

Pay: USD $30/hr (typical), with an allowable range of $27–$31/hr. Work is fully remote and worldwide; you may be asked to handle explicit, violent, or otherwise disturbing content in a secure environment.

Employment type: Contractor, Part-time
Time commitment: 20+ hours/week
Compensation: $27–$31 USD per hour (typical $30/hr)

What you'll do

You will evaluate model outputs for correctness, clarity, and safety; compare multiple responses; and create clear, reproducible rationales for ratings and remediation recommendations.

Assess problem-solving and step-by-step reasoning for accuracy and logical coherence across Japanese and English content.
Spot methodological, conceptual, or factual errors and fact-check when necessary.
Rate or compare multiple model responses by safety, policy alignment, and user impact.
Identify adversarial edge cases, perform red‑teaming style testing, and recommend concrete mitigations.
Document decisions with clear, reproducible rationales suitable for policy and model-improvement teams.

Requirements (must-have)

This role requires strong bilingual skills, senior trust & safety experience, and demonstrated LLM red‑teaming or adversarial testing capability.

Near-native or native proficiency in written Japanese.
Minimum C1 proficiency in written English.
Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience.
Senior-level experience in Trust & Safety, content moderation, policy operations, risk/compliance, investigations, or related safety functions.
Proven LLM red‑teaming or adversarial testing experience, including identifying edge cases and recommending mitigations.
Strong knowledge of safety domains: hate/harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activities and code, and misinformation.
Experience applying policy standards consistently across Japanese and English, including cultural nuance, slang, coded language, and context shifts.
Strong analytical writing with clear, reproducible rationales for moderation or safety decisions.
Comfortable handling explicit, toxic, violent, sexual, or psychologically disturbing content in a secure remote work environment.

Preferred qualifications

These qualifications are helpful but not strictly required.

Localization or translation experience, with ability to preserve meaning, severity, and intent across languages.
Experience working with evaluation or annotation tools and producing structured feedback for model teams.

How selection and work will proceed

Applicants will complete a language and moderation exercise to demonstrate judgment across Japanese and English. Selected reviewers receive task instructions, policy guides, and secure access to the annotation environment.

Your feedback will directly influence model safety: ratings and red‑teaming notes feed into policy updates and model improvements.

Label types: evaluation ratings and text-generation review.
/ proprietary tool provided by the project.
Worldwide applicants accepted; contractors must follow secure handling instructions for sensitive content.